Architecture and workflows

The SambaFlow™ software stack runs on DataScale® hardware. You can run your own PyTorch model, or download a pretrained model from Hugging Face and fine tune it for the tasks you want to perform. In both cases, some initial model conversion is necessary (see Model conversion overview).

In this doc page, you learn about the different components of the software stack, the compile - run - inference cycle (unique to SambaNova), and the command-line arguments.


The different components of the SambaNova hardware and software stack interact with each other. It’s useful to understand the architecture. For example, at times SambaFlow developers might find it useful to investigate what’s going on in the Runtime component.

SambaFlow in the software stack
  1. SambaNova Reconfigurable Dataflow Unit™ (RDU) is a processor that provides native dataflow processing. It has a tiled architecture that consists of a network of reconfigurable functional units. See the white paper SambaNova Accelerated Computing with a Reconfigurable Dataflow Architecture External link.

  2. SambaNova Systems DataScale is a complete rack-level computing system. Each DataScale system configuration consists of one or more DataScale nodes, integrated networking, and a management infrastructure in a standards-compliant data center rack.

  3. SambaNova Runtime. The SambaNova Runtime component allows system administrators to perform configuration, fault management, troubleshooting, etc. See the SambaNova Runtime documentation for details.

  4. SambaFlow Python SDK. Most developers use the SambaFlow Python SDK to compile and run their models on SambaNova hardware.

    1. The developer writes the model code using the SambaFlow Python SDK and compiles the model.

    2. The compiler returns a PEF file.

    3. The developer trains the model on the hardware by passing in the PEF file and the training dataset.

    4. Finally, the developer can use the trained model with a test dataset to verify the model, or run inference using the model.

  5. SambaFlow models.

    • Some models are included in your SambaNova environment at /opt/sambaflow/apps/. These models are primarily examples for exploration.

    • SambaFlow models in our sambanova/tutorials External link GitHub repo) allow you to examine the Python code and then perform compilation, training, and inference runs.

    • This documentation includes detailed code discussions for all tutorials, including a conversion tutorial. See SambaFlow tutorials.


When you develop for RDU hardware, you start with data and a model .py file, and, after inference, you end with a results file.

  1. You pass the model and data to the compiler to generate a PEF file, which you then pass in during training and inference. See Compile.

  2. During a Test run (optional) you pass in the PEF and check with a small set of data that the compiled model runs correctly on your hardware and software combination.

  3. For the Training run, you use the full dataset with your model. You might fine tune an existing model in this step, or might train a new model from scratch. In most cases, it makes sense to save checkpoints and do the training run in stages.

  4. When the model has been trained sufficiently, you can perform an Inference run with new or unlabeled data.


When you compile a model with SambaFlow, it generates a dataflow graph, which is similar to a PyTorch computational graph. Compilation returns a PEF file, which encapsulates the dataflow graph.

You can submit the PEF file when you do training and inference runs. The following diagram shows the process:

Diagram of the workflow explained in the text next
  1. The compiler optionally runs within a Python virtual environment.

  2. For compilation, the compiler takes as input the model and model parameters. The input can come from GitHub or another location.

  3. As part of compilation, the compiler checks and adjusts its behavior:

    • Is compile called with --inference? Then the compiler only goes through the foward pass and generates a much smaller PEF file.

    • Is a non-default hardware version is specified as the target? Then the compiler optimizes for that target hardware.

  4. When compilation completes for the specified number of epochs, the compiler sends a PEF file to the specified output location (the out directory by default).

Test run (optional)

After compilation you can do a test run of the compiled model and check if it runs correctly on the hardware and software combination.

During this step:

  1. You include the PEF file that was generated by the compiler in the compile command. The command uploads the PEF file to the node.

  2. The system configures all necessary elements (PCU and PMU) and creates links between them using on-chip switches.

A test run is not mandatory, but it’s recommended to test the model before starting a long run with many epochs.

The top-level --help menu for many models shows a test command. In contrast to run --test, the test command is used primarily internally to compare results on CPU and results on RDU.

Training run

After you have completed a test run, you can start a training run. Data download and other operations are usually defined in the model’s code. If data preparation is necessary, you can use our data preparation scripts External link .

Diagram of the workflow explained in the text about training next

For the training run:

  • The input includes prepared data, the PEF from the registry, the model itself, and potentially parameters in a config file.

  • The output is one or more checkpoints or a fully trained model. You can restart training from a checkpoint if you want to train for more epochs.

Here are the details:

  • You start with a data source, which you potentially download to a local data volume. Optionally, you might have to perform data preparation.

  • The model code and the parameters to your training run might come from Git or might be stored elsewhere.

  • The compiled model (the output of the compile step above) is stored in a PEF registry (by default, just the /out/<modelname> folder).

  • Most of our tutorials include a Python venv to help you avoid potential issues with package versions. See Use Python virtual environments.

  • Training code is usually set up to generate checkpoint files (.pt) after a certain number of steps. Those files are stored in a checkpoint volume.

  • As part of the training run, information about loss and accuracy are sent to stdout. That information can help you determine how many epochs of training are necessary.

Inference run

The final step is running inference. It usually makes sense to run inference first on a small dataset without labels, and determine if the results make sense. If not, additional training might be necessary. Otherwise, the model is ready for use with a larger dataset.

Diagram of the workflow explained in the text about inference next

On the surface, the inference process is similar to training, but there are important differences.

  • You start with a data source, but in contrast to the training data, inference data sources are not labeled. Optionally, you might have to perform data preparation.

  • The model code and the parameters to your inference run might come from Git or be stored elsewhere.

  • You can pass in a PEF file. If you use a model that’s been compiled for inference, the file is significantly smaller because it includes only the Forward section. Smaller files result in more efficient usage of your SambaNova hardware. PEF files live in a PEF registry (by default, just the /out/<modelname> folder).

  • You also pass in a checkpoint or a trained model.

  • Most of our tutorials include a Python venv to help you avoid potential issues with package versions. See Use Python virtual environments

  • For inference, you pass in a checkpoint file (.pt) that was generated by the training run after a certain number of steps.

  • The result of the inference run is stored in a results volume (by default, just in out/<modelname>/resulte).

See Run and verify inference for an example discussion.

Command-line arguments

Each model supports different command-line arguments. To see all arguments, run the model with the task (compile or run) and --help, for example, compile --help.

For compilation:

  • All models support the shared arguments that documented in Arguments to compile.

  • You can generate a PEF for inference by passing in --inference. When you compile for inference the file is smaller because it includes only the Forward section.

  • All models support an additional set of experimental shared arguments, usually used with working with SambaNova Support. To include these arguments in the help output, run compile --debug --help.

  • Each model has an additional set of model-specific arguments.

For training:

  • You call the model with run, which defaults to training. You must pass in a PEF file.

  • All models support a set of shared arguments.

  • Each model supports an additional set of model-specific arguments.

For inference:

  • You call the model with run --inference.

  • If you compiled your model with --inference, you pass in that PEF file.

  • Most arguments to run are supported when you run inference.