SambaFlow software release notes

Release 1.24

Release 1.24 was an external release with general compiler improvements. However, there are no notable updates or highlights.

Release 1.23

Release 1.23 includes improved OS support, changes to application locations, and renaming of components. Please review the following updates carefully to ensure compatibility with your environment.

Supported OS versions

Red Hat: Starting with this release, SambaFlow supports Red Hat (8.8).
Ubuntu: The version for Ubuntu 22.04.x remains unchanged.

Package/Application location change

To align with Linux best practices, 3rd party applications have been relocated from their previous locations (/opt/ or /usr/local/) to a more standardized directory (/opt/sambanova/). This change ensures compatibility, avoids conflicts with pre-installed customer packages, and provides controlled versions compatible with the SambaNova software stack.

New location: /opt/sambanova/
Previous locations: /opt/ and /usr/local/

If you rely on custom scripts or configurations pointing to old paths, please update references to the new directory.

Renamed applications

The following applications have been renamed, and their old names are deprecated starting with release 1.23.

Old names (deprecated) New names

Old names (deprecated)	New names
`sambaflow-apps-datascale-image-segmentation`	`sambaflow-apps-datascale-vision-segmentation`
`sambaflow-apps-datascale-image-segmentation-3d`	`sambaflow-apps-datascale-vision-segmentation-3d`
`sambaflow-apps-datascale-image-vit`	`sambaflow-apps-datascale-vision-vit`

sambaflow-apps-datascale-image-segmentation

sambaflow-apps-datascale-vision-segmentation

sambaflow-apps-datascale-image-segmentation-3d

sambaflow-apps-datascale-vision-segmentation-3d

sambaflow-apps-datascale-image-vit

sambaflow-apps-datascale-vision-vit

Deprecated components

The following packages and application names are deprecated.

Packages

The following package is deprecated and has been removed starting with release 1.23:

sambaflow-apps-datascale-image-object-detection

Application names

The following application names are deprecated as part of the renaming process (see above).

sambaflow-apps-datascale-image-segmentation
sambaflow-apps-datascale-image-segmentation-3d
sambaflow-apps-datascale-image-vit

Release 1.22

Release 1.22 includes improved OS support.

Supported OS versions

Ubuntu: Starting with this release, SambaFlow supports Ubuntu 22.04.x.

As part of the upgrade to support Ubuntu 20, we’re changing an environment variable. You don’t have to do anything to have the change take effect.
Red Hat: The version for Red Hat (8.5) remains unchanged.

Release 1.21

Release 1.21 includes internal code changes that support our first release of SambaNova Model Zoo, available in this public GitHub repository.

This first official release of SambaNova Model Zoo is currently in Beta. SambaNova customers can download a container image (Devbox) that includes the SambaFlow compiler, other SambaNova libraries, and all prerequisite software.

Existing SambaNova customers can contact their Customer Support representative to access the Devbox.
If you’re new to SambaNova and interested in trying out Model Zoo, contact us at [email protected] to get started!

See the Model Zoo Release Notes for details.

Release 1.20

Release 1.20 was an internal release. No user-visible changes were made in that release.

Release 1.19

New features

This release has primarily had a focus on performance improvement and some other features that are not yet visible to customers.

Cached compilation mode (experimental)

Our experimental cached compilation mode can speed up compilation times of large models. In this mode, the compiler maintains a cache of previously compiled sections of a model, so that subsequent compilations can use the cached sections instead of recompiling them.

To enable cached compilation mode, set the SN_PEF_CACHE environmental variable to the path of a folder.
The compiler will then populate a cache at that location (and create the folder if it doesn’t exist). The content of the cache is an internal detail subject to change.

Cached compilation mode supports the development flow of a single user who makes frequent changes to a model. You cannot share the cache with other users. For the best use of the cache, make small changes (instead of extensive changes) followed by a compile. By limiting the scope of a change, you increase the likelihood that more sections can be pulled precompiled from the cache because they did not change.

API updates

In argmax(), the default value of keepdim (bool) has changed from True to False. keepdim is used to indicate whether Samba retains the dim in the output tensor. Now the dim is no longer retained by default.
The groupby() operator was added in this release.

Documentation

Added Best practices
Added How to use data parallel mode
Added Compose complex operations with parallel patterns

Release 1.18

In Release 1.18, most of SambaFlow was migrated from /usr/local to /opt/sambanova. Add /opt/sambanova/bin to your PATH. s

New features

Tensor parallel support (Beta). Tensor parallel mode uses multiple RDUs for inference and training. Tensor parallel speeds up the runtime performance and ensure that large models, which might exceed the memory limit of a single socket, still run. See How to use tensor parallel mode (Beta) for details.
Multigraph support. The new multigraph feature supports partitioning a model into individual graphs so you can run each graph separately. See Use multigraph to partition models.
UE Replay (Beta). Some updates to Uncorrectable Error replay (Beta) make the feature easier to use.
Mixed precision (Beta). Mixed precision combines the use of different numerical formats (such as FP32 and BF16) to reduce memory footprint and speed up large neural network workloads. See Mixed precision support for details and examples.

Compiler and performance improvements

New and renamed heuristics. This release includes improvements to heuristics for use with o1.
- SAFE_GEMM, DEFAULT_GEMM (new), AGGRESSIVE_GEMM. Applicable only to patterns that are dominated by a single large matrix multiply (GEMM) operation.
- MHA (renamed in 1.18). For use with a multi-headed attention block. Renamed from GPT3_MHA.
- SDPA (new in 1.18). For use with PyTorch SDPA operations.
The compiler’s new deduplication feature can reduce compile time and improve model performance. The feature is currently limited to a single RDU. The feature is on by default. Contact Customer Support if you see a need to turn it off.
This release includes an improved algorithm for mapping compute graphs onto RDU resources. The enhanced algorithm, which is on by default:
- Accelerates the optimization process, resulting in shorter compile times
- Reduces on-RDU congestion when running models, providing performance improvements.

New operators

This release includes several new PyTorch operators. See Functional Operators

Supported datatypes for each new operator are still being validated and more information will be made available at a later date. If you have specific questions on support datatypes, contact SambaNova Support.

Arithmetic operators
- abs()
- mul()
- relu()
- rsqrt()
- scale()
- sigmoid()
- silu()
Parallel patterns operators
- sn_gather()
- sn_imm()
- sn_iteridx()
- sn_reduce()
- sn_scatter()
- sn_select()
- sn_zipmapreduce()
Tensor operators
- ct_attention()
- sn_identity()
- to()
- type_as()
Other operators
- multi_head_attention()
- layer_norm()

Documentation improvements

Updated SambaFlow API Reference with a new template that supports both dark and light mode.
In the API reference, new APIs now include information about the release (e.g. New in 1.18)
Updates to the _Run pretrained models on RDU tutorial- include code snippets for download and conversion of the dataset in hf-compile-run.adoc#_download_the_dataset and hf-compile-run.adoc#_prepare_the_dataset.

Release 1.17

New compiler features

Released o0 and o1 compiler optimization modes (previously in Beta). See Compiler optimization modes.
(Beta) Added support for operator fusion rule yaml files and heuristics for use in conjuction with the o1 compiler option.
- SambaNova will make a limited set of fusion rule yaml files available that direct the compiler, resulting in a more highly optimized PEF for certain families of models (e.g. LLM). See Operator fusion rule yaml syntax.
- Users can make changes to the yaml file to achieve more efficient compiler behavior.
(Beta) Added support for preset scheduling heuristics to improve fused operators' performance in o1 compiler mode. Users cannot edit the heuristics in this release. See Operator fusion heuristics.

Other new features and improvements

Introduced beta version of the uncorrectable error replay (UE replay) feature, which attempts to automatically recover and continue a training run if the run encounters a UE. See Uncorrectable Error replay (Beta).
For improved performance, changed ENABLE_LINEAR_GRAD_ACCUM_STOC to default to 1 instead of 0. As a result, stochastic rounding is turned on for mixed-precision general matrix multiply (GEMM) by default. If you want to return to the previous default, contact SambaNova Support.
Enhanced PyTorch operator support
- silu: FP32 (experimental support)
- gelu: FP32 (experimental support)
- tanh: FP32 (experimental support)
- For mul, full support for B16 and FP32 had been omitted from the documentation by mistake. It’s now been added.

Performance improvements

Enabled compile-time device-program control scheduling for Bloom 176B and GPT13B LLM models for NLP inference.

Supported versions

PyTorch: 1.10.2+cpu
Changed Python: 3.8 (1.17.3 and later)

Documentation improvements

Some documentation updates that are not release dependent became available in the SambaFlow 1.16 documentation after that version was released. Here is the complete list of release-dependent and release-agnostic documentation.

Several of our tutorials are now available from the new sambanova/tutorials GitHub repo. More to be added in future releases. See Tutorials for an overview of all tutorials.
SambaFlow learning map has an overview of documentation and tutorials for new users.
Model conversion overview is a high-level discussion of model porting tasks. Includes pointers to the porting example that is part of this doc set.
Hyperparameter reference is a short overview of Hyperparameters. We’ll point to that doc page from session:run() in the API reference .
Updates and fixes in Compiler argument reference.

API Reference improvements

Changes and additions to the SambaFlow API reference :

Added documentation for samba.random
Added documentation for samba.from_torch_model
Added documentation for samba.utils.trace_graph
Added documentation for samba.optim
Fixes to some supported data types in Functional Operators
Small fixes for samba.session documentation
Fixed some broken links

Release 1.16 (2023-07-14)

New features and other improvements

Introduced new compiler modes -o0 and -o1 (Beta), which allow users to fine-tune compiler performance.
- See SambaFlow compiler overview for some background information.
- See Compiler argument reference for reference documentation, which includes examples.
Change to compiler --help behavior. The --help command now returns a limited number of fully supported options. A call to compile with --help --debug returns a longer list of options, some of them experimental.

Performance improvements

Various optimizations in this release help improve model performance and reduce compile times especially for NLP models.

Documentation improvements

Updated API Reference includes documentation for supported PyTorch operators

API Reference documentation always opens in a new tab (or window). To return to the main doc set, click the previous tab (or window).
New SambaNova messages and logs doc page explains which messages you can safely ignore, where to find which logging information, and which errors you might be able to resolve yourself.
New SambaFlow compiler overview doc page gives an overview of the compiler stack and discusses some compiler arguments, including the new o0, o1, etc. options.
New Compiler argument reference doc page is a reference to frequently used compiler arguments and includes a discussion of the new arguments.
New Use sntilestat for performance analysis doc page explains how to use the sntilestat tool for performance analysis and includes examples of visualizing sntilestat CSV output in a spreadsheet.

Obsolete components and APIs

The grad_of_outputs parameter in samba.session.run, was deprecated in release 1.15 and has been removed. Use SambaTensor::sn_grad to set an output tensor’s gradients instead.

Release 1.15 (2023-03-30)

Deprecated components and APIs

The grad_of_outputs parameter in samba.session.run is deprecated and will be removed in release 1.16. Use SambaTensor::sn_grad to set an output tensor’s gradients instead.

Release 1.14 (2023-01-10)

Deprecated components and APIs

The following APIs have been renamed. The old names are deprecated.

Renamed samba.from_torch to samba.from_torch_tensor
Renamed samba.from_torch_ to samba.from_torch_model_

Release 1.13 (2022-11-03)

New features and other improvements

New features
- Added option to sntilestat to skip idle tiles.
- Enhanced multi-processing support for SambaNova Runtime APIs.
- Enhanced host profiling information and detailed timeline view in SambaTune.
- Enhanced snprof and added more robust fault reporting in snstat.
Performance improvements
- Faster SambaFlow context creation.
- More efficient CPU usage.
- Better performance for scaleout operations.
Software
- Updated PEF to version 2.5.0.
  
  Recompile all models with this release due to the PEF version change.
- Version 2 of SambaFlow compiler scheduler, specified with option --mac-v2, is now the default. The --mac-v1 is still supported but requires using explicit option.

Deprecated components

venv: The venv shared generic package is deprecated and has been replaced by model-specific venv packages. The generic package will be removed from future releases.
UnoSecInf: The UnoSecInf inference performance test, which is based on section-by-section mapping, is deprecated starting in Release 1.13. Starting in Release 1.14, this performance test will no longer be available.

The uno_full.py model is not deprecated.

Release 1.12.7 (2022-07-30)

New features

Added SambaTune: a tool that supports profiling application performance.
Improved Scale-out performance through parallel reduce.
Enhanced RDU reset support with VM.

Supported components and versions

Operating Systems

Red Hat Enterprise Linux 8.5
Ubuntu Linux 20.04 LTS

Software

Updated PEF to version 2.0.0. Models must be recompiled to be used with this release due to the PEF version change.
Version 2 of SambaFlow compiler scheduler, specified with option --mac-v2, is now the default. The --mac-v1 will continue to be supported but requires using explicit option.

Deprecated components

The global virtual environment under /opt/sambaflow/venv is deprecated and will be removed in version 1.13. It will be replaced by individual virtual environments for each model.