SambaFlow Software Release Notes

Release 1.17

New compiler features

  • Released o0 and o1 compiler optimization modes (previously in Beta). See Compiler optimization modes.

  • (Beta) Added support for operator fusion rule yaml files and heuristics for use in conjuction with the o1 compiler option.

    • SambaNova will make a limited set of fusion rule yaml files available that direct the compiler, resulting in a more highly optimized PEF for certain families of models (e.g. LLM). See Operator fusion rule yaml syntax (Beta).

    • Users can make changes to the yaml file to achieve more efficient compiler behavior.

  • (Beta) Added support for preset scheduling heuristics to improve fused operators' performance in o1 compiler mode. Users cannot edit the heuristics in this release. See Operator fusion heuristics.

Other new features and improvements

  • Introduced beta version of the uncorrectable error replay (UE replay) feature, which attempts to automatically recover and continue a training run if the run encounters a UE. See Uncorrectable Error Replay (Beta).

  • For improved performance, changed ENABLE_LINEAR_GRAD_ACCUM_STOC to default to 1 instead of 0. As a result, stochastic rounding is turned on for mixed-precision general matrix multiply (GEMM) by default. If you want to return to the previous default, contact SambaNova Support.

  • Enhanced PyTorch operator support

    • silu: FP32 (experimental support)

    • gelu: FP32 (experimental support)

    • tanh: FP32 (experimental support)

    • For mul, full support for B16 and FP32 had been omitted from the documentation by mistake. It’s now been added.

Performance improvements

  • Enabled compile-time device-program control scheduling for Bloom 176B and GPT13B LLM models for NLP inference.

Supported versions

  • PyTorch: 1.10.2+cpu

  • Changed Python: 3.8 (1.17.3 and later)

Documentation improvements

Some documentation updates that are not release dependent became available in the SambaFlow 1.16 documentation after that version was released. Here is the complete list of release-dependent and release-agnostic documentation.

API Reference improvements

Changes and additions to the SambaFlow API reference External link:

  • Added documentation for samba.random

  • Added documentation for samba.from_torch_model

  • Added documentation for samba.utils.trace_graph

  • Added documentation for samba.optim

  • Fixes to some supported data types in Functional Operators

  • Small fixes for samba.session documentation

  • Fixed some broken links

Release 1.16 (2023-07-14)

New features and other improvements

  • Introduced new compiler modes -o0 and -o1 (Beta), which allow users to fine-tune compiler performance.

  • Change to compiler --help behavior. The --help command now returns a limited number of fully supported options. A call to compile with --help --debug returns a longer list of options, some of them experimental.

Performance improvements

  • Various optimizations in this release help improve model performance and reduce compile times especially for NLP models.

Documentation improvements

  • Updated API Reference includes documentation for supported PyTorch operators

    API Reference documentation always opens in a new tab (or window). To return to the main doc set, click the previous tab (or window).
  • New SambaNova messages and logs doc page explains which messages you can safely ignore, where to find which logging information, and which errors you might be able to resolve yourself.

  • New SambaFlow compiler overview doc page gives an overview of the compiler stack and discusses some compiler arguments, including the new o0, o1, etc. options.

  • New Compiler argument reference doc page is a reference to frequently used compiler arguments and includes a discussion of the new arguments.

  • New SambaNova PyTorch operator support doc page lists which PyTorch options are fully supported and experimentally supported. This page will be updated with each release and includes links to the API Reference.

  • New Use sntilestat for performance analysis doc page explains how to use the sntilestat tool for performance analysis and includes examples of visualizing sntilestat CSV output in a spreadsheet.

Obsolete components and APIs

The grad_of_outputs parameter in, was deprecated in release 1.15 and has been removed. Use SambaTensor::sn_grad to set an output tensor’s gradients instead.

Release 1.15 (2023-03-30)

Deprecated components and APIs

The grad_of_outputs parameter in is deprecated and will be removed in release 1.16. Use SambaTensor::sn_grad to set an output tensor’s gradients instead.

Release 1.14 (2023-01-10)

Deprecated components and APIs

The following APIs have been renamed. The old names are deprecated.

  • Renamed samba.from_torch to samba.from_torch_tensor

  • Renamed samba.from_torch_ to samba.from_torch_model_

Release 1.13 (2022-11-03)

New features and other improvements

  • New features

    • Added option to sntilestat to skip idle tiles.

    • Enhanced multi-processing support for SambaNova Runtime APIs.

    • Enhanced host profiling information and detailed timeline view in SambaTune.

    • Enhanced snprof and added more robust fault reporting in snstat.

  • Performance improvements

    • Faster SambaFlow context creation.

    • More efficient CPU usage.

    • Better performance for scaleout operations.

  • Software

    • Updated PEF to version 2.5.0.

      Recompile all models with this release due to the PEF version change.
    • Version 2 of SambaFlow compiler scheduler, specified with option --mac-v2, is now the default. The --mac-v1 is still supported but requires using explicit option.

Deprecated components

  • venv: The venv shared generic package is deprecated and has been replaced by model-specific venv packages. The generic package will be removed from future releases.

  • UnoSecInf: The UnoSecInf inference performance test, which is based on section-by-section mapping, is deprecated starting in Release 1.13. Starting in Release 1.14, this performance test will no longer be available.

    The model is not deprecated.

Release 1.12.7 (2022-07-30)

New features

  • Added SambaTune: a tool that supports profiling application performance.

  • Improved Scale-out performance through parallel reduce.

  • Enhanced RDU reset support with VM.

Supported components and versions

Operating Systems

  • Red Hat Enterprise Linux 8.5

  • Ubuntu Linux 20.04 LTS


  • Updated PEF to version 2.0.0. Models must be recompiled to be used with this release due to the PEF version change.

  • Version 2 of SambaFlow compiler scheduler, specified with option --mac-v2, is now the default. The --mac-v1 will continue to be supported but requires using explicit option.

Deprecated components

  • The global virtual environment under /opt/sambaflow/venv is deprecated and will be removed in version 1.13. It will be replaced by individual virtual environments for each model.