samba.session¶

class SambaSession(pef_path: str = '')¶

A class for compiling and running applications with SambaFlow. Running a SambaNova model requires these steps:

First, before compiling and running, you have to convert your PyTorch model to a SambaFlow model. See the SambaFlow Conversion Tutorial and SambaNova PyTorch operator support for some background.

Then you compile the model. That means the compiler traces the model and generates a PEF file. The PEF is a binary executable file that contains the full details of the model and is used to run the model on SambaNova’s Reconfigurable Dataflow Unit (RDU). Not all PyTorch operators are currently supported. At compile time, the call to SambaSession.compile() traces the model to extract a computational graph and then compiles the graph into a PEF. A SambaNova computational graph is an optimized internal representation of the model which uses only operators supported on the RDU.

Next, you run the model on the RDU by calling sambaflow.samba.utils.utils.trace_graph() and then SambaSession.run().

trace_graph reads the PEF and loads its information, traces the model again to ensure that
the runtime computational graph matches the compile-time graph, and initializes the SambaNova Runtime backend.
run executes the computation and retrieves the output values from the RDU.

In the context of the SambaFlow frontend and the SambaSession class, the term “runtime” refers to the high-level frontend entry point into running an application with SambaFlow (after the compiler has generated a PEF). In contrast, the low-level SambaNova Runtime service communicates closely with the RDU hardware.

For details on the recommended workflow, see this link. For details on compiling a PEF file, see the Compilation overview.

Parameters:: pef_path – path to the PEF. If nothing is provided, no PEF is loaded when the SambaSession class is initialized. The PEF can be loaded later when you run an application via sambaflow.samba.utils.utils.trace_graph().

add_graph(graph: SambaGraph, overwrite: bool = False)¶

New in version 1.18.

Adds a SambaGraph to SambaSession’s collection of graph objects that will be compiled when you call compile_multigraph(). You can call add_graph() multiple times if there are multiple graphs to compile. SambaSession stores the graphs in the order that you add them. When the PEF is compiled with compile_multigraph(), the graphs will be compiled in the order that they were stored with add_graphs(). You cannot have multiple graphs with the same name. If overwrite is True and the incoming graph has the same name as a SambaGraph that was already added, the incoming graph replaces the old graph and is placed at the end of the graph ordering.

Parameters:

graph – the graph to add
overwrite – if True and an incoming SambaGraph has the same name as a SambaGraph that was already added, overwrite the older SambaGraph with the incoming SambaGraph. If False and an incoming SambaGraph has the same name as a previously added SambaGraph, raise an error.

atexit_register(func: Callable, *args, **kwargs) → None¶

Registers an exit handler that is not bypassed if you run an app with use_abexit(). If not using use_abexit(), runs with the normal Python built-in atexit.register().

Parameters:

func – the function to be executed at termination
*args – positional args for func
**kwargs – keyword args for func

compile(model: torch.nn.Module, inputs: Tuple[SambaTensor, ...] | Dict[str, SambaTensor], optimizers: torch.optim.Optimizer | List[torch.optim.Optimizer] | None = None, name: str = '', loss_indices: List[Tuple[int, ...]] | None = None, io_host_memory: bool | None = None)¶

Traces and compiles a model and its optimizers into a PEF. Tracing is the process of traversing a model to extract the model’s computational graph in terms of operations and tensors, where the operations are the graph’s nodes and the tensors are the graph’s edges.

Tracing builds the computational graph with the operations that are supported on the RDU. After the model has been traced, the compiler optimizes the graph and maps it to the RDU. The compiler also generates a schedule of sections, which determines how how the model will be run. Each section is a portion of the graph and has an ID and a type. When you run the PEF on the RDU, the schedule is read and executed.

Parameters:

models – The model that will be compiled into the PEF. The model can be a PyTorch model that has been modified to work on RDU.
inputs – The input(s) that are used to trace the provided models. These inputs should be exactly the tensors that the model expects in the forward() function. Inputs can also be provided in a dictionary if the model’s forward() function accepts kwargs. In that case, the key of the dictionary is the keyword argument name and the value of the dictionary is the value of the keyword argument.
optimizers – The optimizer(s) that are associated with the provided model. The model can have multiple optimizers or no optimizers. If the model has a single optimizer, specify the optimizer directly, for example, optimizers = opt0. If the model has multiple optimizers, specify the optimizers in a list, for example, optimizers = [opt0, opt1]. If no optimizer is needed, specify None. Defaults to None.
name – A name used internally within the compiler. NOTE: This is not the name of the output PEF that the compiler generates.
init_output_grads – Deprecated.
loss_indices – The indices of the model outputs that are loss tensors. For example, if outputs=[out0, out1, loss0, loss1], then specify loss_indices=[2, 3]. If not specified, the compiler assumes that all model output tensors are loss tensors and attempts backpropagation from all output tensors. Defaults to None.
graph_names – Deprecated.
graph_transform_hook – Deprecated.
io_host_memory – Specify that every traced tensor should be read directly from host memory. Host memory is faster than the other memory options but there is limited capacity. Alternatively, memory types for individual tensors can be set with sambatensor.mem_type = mem_type. Memory types include DDR (DRAM), HBM (high bandwidth memory), Host (the CPU host for the RDU), and None (lets the compiler decide). Defaults to None.

Example:

# example of compiling a model that accepts one input tensor and has an SGD optimizer
>>> import torch
>>> import sambaflow.samba as samba
>>> import sambaflow.samba.utils as utils
>>> model = torch.nn.Linear(10, 10)
>>> samba.from_torch_model_(model)
>>> torch_input = torch.randn(5, 10)
>>> ipt = samba.from_torch_tensor(torch_input, name='input')
>>> optim = samba.optim.SGD(model.parameters(), lr=0.1)
>>> samba.session.compile(model, (ipt, ), optim)

compile_multigraph(name: str = '', samba_only: bool = False, **kwarg) → str | None¶

New in version 1.18.

The multigraph compile API. Compiles graphs specified by add_graph() into a PEF.

Parameters:: name – name of the model

disable_graphamp()¶: Context manager for selectively disabling graphamp

static disable_lazy_param() → None¶: Disables lazy parameter mode and enables normal eager module initialization for the lifetime of this application. See enable_lazy_param().

disable_pinned_memory() → None¶: Turns off pinned memory. See enable_pinned_memory().

enable_lazy_param(seed: int | None = None, init_threads: int = 8, inverse_transform_sampling: bool = False, fp32_to_bf16_rounding: str = 'nearest_even') → None¶

Enables lazy parameter mode. Lazy parameter mode either skips parameter initialization or initializes only placeholders for parameters. That approach can be helpful for compiling and running large models when host memory is limited. At compile time, enabling lazy parameter mode does not initialize parameters because tracing requires only metadata such as tensor shapes and tensor dtypes. At runtime, enabling lazy parameter mode replaces the parameters with placeholders and factory functions, preserving the information required to produce the real tensor without actually initializing the tensor values. When the tensors are needed at runtime, internal factory functions materialize the tensor by populating it with numeric values.

Parameters:

seed – random seed used when materializing random values that were lazily instantiated. Guarantees reproducibility regardless of the current random state of the program. Defaults to None.
init_threads – optional number of threads for multithreaded random number generation for randomly initialized tensors. Defaults to 8.
inverse_transform_sampling – if true, uses inverse transform sampling for normal random number generation. Defaults to False.
fp32_to_bf16_rounding – how to cast randomly-generated values from float32 to bfloat16, either 'nearest_even' or 'truncation'. 'nearest_even': cast tensor to bfloat16 through PyTorch’s tensor.bfloat16. 'truncation': because NumPy does not natively support bfloat16, 'truncation' reinterprets the values as a uint16. The most common usage is 'nearest_even'. 'truncation' via NumPy scales better with multithreading. Defaults to 'nearest_even'.

enable_pinned_memory() → None¶

Enables pinned memory on the host.

Pinned memory can be used only with SambaLoaders and offers a performance boost in transferring data to and from the RDU.

By default, all user inputs and output gradients are eligible for pinned memory.

Pinned memory is a runtime optimization that reserves a chunk of nonpageable memory for tensors that are transferred to and from the RDU to reduce the amount of transfers that take place on the host. Normally, tensors are placed in pageable memory. When those tensors are moved to the RDU, they need to be copied to a staging area of nonpageable memory before they can be transferred to the RDU. With pinned memory enabled, these tensors are allocated directly in the region of nonpageable pinned memory and they do not need to be copied from pageable memory to nonpageable memory on the host.

NOTE: You do not need to set the fast_access attribute for tensors that are fed into a SambaLoader because the SambaLoader assumes those tensors are all pinnable.

end_runtime_profile(log_file: str = 'cprofile_tmp.log') → None¶

Turns off SambaNova Runtime profiling and Python cProfile, and dumps the profile results to a file. See start_runtime_profile().

Parameters:: log_file – log file to dump the results of the profiling to

end_samba_profile(filename: str | None = None) → None¶

Turns off the Samba profiler and dump the profile results to a file. See samba_profile().

Parameters:: filename – the file to dump the profile results to

get_argin_names() → OrderedSet¶: Gets the names of the argins from the PEF. The argins are the hyperparameters that you provide to run() to control the RDU’s behavior at runtime.

get_dropout_rate_argin_names() → List[str]¶: Gets the names of the argins related to dropout rate from the PEF. These argins are the hyperparameters that you provide to run() to control the dropout rate of different layers in the model.

get_dropout_seed_argin_names() → List[str]¶: Gets the names of the argins related to the dropout seed from the PEF. These argins are the hyperparameters that can be provided to the RDU at runtime to set the dropout seeds and control dropout randomness.

get_samba_tensor_by_name(name: str) → SambaTensor¶

Gets a traced SambaTensor by name. This function does not retrieve tensor values from the RDU (see get_tensors_by_name() instead).

Parameters:: name – the name of the SambaTensor to get

get_section_types() → Set[str]¶: Gets the section types that were compiled in the PEF. See run() for a list of supported section types.

get_tensors(samba_tensors: List[SambaTensor]) → Tuple[SambaTensor, ...]¶

Get SambaTensor values from RDU memory given the mirrored SambaTensors on the host.

Parameters:: samba_tensors – the mirrored SambaTensors on host

get_tensors_by_name(names: List[str]) → Tuple[SambaTensor, ...]¶

Gets SambaTensor values from RDU memory given the names of the tensors.

Parameters:: names – the names of the SambaTensors to get from RDU memory

get_weight_names() → List[str]¶

Gets the names of all weight symbols in the PEF.

Returns:: List of names of each weight

init_multigraph_runtime(self, pef: str, transfer_device: bool = True) → None¶

New in version 1.18.

Sets the PEF, initializes the SambaFlow runtime backend, and transfers the traced tensors to the device. Call this function before running a PEF with multigraph.

Parameters:

pef – path to the PEF
transfer_device – whether to transfer traced tensors to the device

static reset(pef_path: str = '') → None¶

Deconstructor for SambaSession. Initializes a new SambaSession instance with the specified PEF path.

Parameters:: pef_path – the path to the PEF that will be used to initialize SambaSession. Defaults to the empty string (‘’).

reset_random_generator() → None¶: Resets the random number generators.

run(input_tensors: Tuple[SambaTensor, ...] | List[SambaTensor] = None, output_tensors: Tuple[SambaTensor, ...] | List[SambaTensor] = [], hyperparam_dict: Dict[str, int | float] = {}, data_parallel: bool = False, reduce_on_rdu: bool = False, section_ids: List[int | List[int]] = [], section_types: List[str] = [], run_until_call_id: int | None = None, run_from_call_id: int | None = None, data_parallel_mode: str = 'normal', train: bool = True)¶

Runs a PEF on the RDU. After the model has been compiled, you call this function to train the model or perform inference. The function retrieves the results from the RDU and returns them. Note that trace_graph() needs to be called just once before the call to run() or any other subsequent run() calls.

By default, runtime execution follows the PEF schedule that the compiler created and wrote to the PEF. You can use section_ids or section_types to control which sections to run. For example, to run only the forward pass, specify section_types=['FWD']. See the section_types parameter below.

Parameters:

input_tensors – User-provided inputs to be transferred from host to RDU so that the RDU can pass the inputs through the model. Use this parameter to pass in input SambaTensors that have not been transferred to the RDU yet or have new values since the previous run() call. The SambaTensor.sn_name attribute of each input tensor should match the attribute of the corresponding tensor that was traced at compile time. Initially, input PyTorch tensors or SambaTensors in an application are instantiated in host memory and the data is not available on the RDU. Specifying the input tensors in this parameter triggers the transfer of input values to the RDU. If no tensors are specified, no tensor values are transferred to the RDU, so SambaFlow assumes that the input tensor values were transferred to the RDU in previous run() calls and performs computation using the values that the RDU has for the input tensors. Defaults to None.
output_tensors – Output tensors to evaluate that are retrieved from the RDU. These tensors can be just a subset of the output tensors of the model if certain output tensors are not needed on the host. Note that after calling trace_graph(), the model’s output tensors can be accessed via model.output_tensors. Defaults to an empty list ([]).
hyperparam_dict – Hyperparameters that control the RDU’s behavior at runtime. Hyperparameters that have already been provided in a previous run() call do not need to be re-specified because previously provided hyperparameters will be maintained. See the Hyperparameter reference for details on the supported hyperparameters. Defaults to {}.
data_parallel – parallelize training across multiple RDUs, including any necessary gradient synchronization. In data parallel (DP) mode, the model is replicated on multiple RDUs and the batch is split amongst the replicas. Defaults to False.
reduce_on_rdu – In data parallel mode, whether to reduce the gradients on the RDU or on the host. Defaults to False.
section_ids – Section IDs and order to run them in. If not specified, runs the full PEF schedule. Specify multiple section IDs to run them in sequence. The section IDs that are in the PEF can be found in the .pef.log file that was generated at compile time. You cannot specify both section_types and section_ids. Defaults to the empty list ([]).
section_types –
Section types and the order to run them in. If not specified, runs the full PEF schedule. Specifying a section type here runs all sections of that type. Specify multiple section types to run them in sequence. The section types can be found in the .pef.log file that was generated at compile time. Not all sections are present in every app. You cannot specify both section_types and section_ids. Section types are:
- 'ZEROGRAD': sets the model parameter gradients to 0. Note that an explicit ZEROGRAD section appears only in applications with gradient accumulation enabled. In applications without gradient accumulation, the ZEROGRAD section is not needed because the gradient computation overwrites the gradient region in memory.
- 'FWD': performs a forward pass over the model, computing the intermediate activations.
- 'BCKWD': performs a backward pass over the model, computing the gradients of the model parameters.
- 'OPT': performs the optimizer step and updates the model parameters.
- 'REDUCE': is used in data parallel scenarios to combine the gradients across workers.
- 'GRADNORM': normalizes and balances the gradients. Useful in multitask applications.
Defaults to the empty list ([]).
run_until_call_id – The call to run executes the schedule in order until reaching the ID you specify with this parameter.
run_from_call_id – The call to run executes the schedule beginning at the ID you specify with this parameter.
graph_name – In development.
data_parallel_mode –
specify which data parallel method to use. The methods are:
- 'normal': executes the PEF schedule in order.
- 'inorder': runs with gradient synchronization overlap, where gradient synchronization across replicas can occur in parallel. The device runs sections in the order of the schedule in the PEF.
- 'optimal': runs with gradient synchronization overlap, where gradient synchronization across replicas can occur in parallel. The device can perform optimizer updates and gradient normalization computation sections out of order for better performance.
Defaults to 'normal'.
train – specifies whether running in training mode or evaluation mode. Currently used only to control the rates of the dropout layers in the model. If in training mode, the dropout rates provided in hyperparam_dict are honored. If in evaluation mode, dropout rate values are automatically converted to 0 and dropout rates are restored when switching back to training mode. Defaults to True.

Example:

# example run of entire PEF schedule
>>> import torch.nn as nn
>>> import sambaflow.samba as samba
>>> import sambaflow.samba.utils as utils
>>> model = nn.BiLinear(10, 10)
>>> samba.from_torch_model_(model)
>>> optim = samba.optim.SGD(model.parameters(), lr=0.1)
>>> ipt = samba.randn(5, 10, name='input')
>>> utils.trace_graph(model, (ipt, ), optim, init_output_grads=True, pef=path_to_pef)
>>> samba_outputs = samba.session.run(model, (ipt, ), model.output_tensors)

# example run of only the forward pass
>>> samba_outputs = samba.session.run(model, (ipt, ), model.output_tensors, section_types=["FWD"])

samba_profile(filename: str | None = None)¶

Context manager for using the profiler for the SambaFlow Python SDK frontend when running an application. The Samba profiler measures how long operations in the frontend take on the host. By comparison, the runtime profiler (see start_runtime_profile()) measures how long operations take on the RDU. Samba profiler events include:

SAMBA_SESSION_RUN_[SECTION_TYPES]: if calling run() with parameter section_types, how long it takes samba.session.run() to run.
SAMBA_SET_PEF: how long it takes to parse the PEF. The PEF is parsed when SambaSession is initialized, so the Samba profiler will need to be enabled before then. Note that typically the global SambaSession object is initialized when sambaflow.samba is imported.
SAMBA_INIT_RUNTIME: how long it takes to initialize the SambaNova Runtime backend.
PyRT_GET_TENSORS: how long it takes to perform a function call that retrieves tensors from the RDU
PyRT_GET_TENSORS_GATHER: how long it takes to perform a function call that retrieves tensors from all RDUs. This event will only occur if running in distributed learning mode, which is enabled by calling sambaflow.samba.utils.trace_graph() with the distlearn_config argument during runtime.
PyRT_SET_TENSORS: how long it takes to perform a function call that sends tensors from the host to the RDU
PyRT_SET_TENSORS_BROADCAST: in distributed learning mode, how long it takes to perform a function call that sends tensors from the root rank to all other RDUs
PyRT_SET_PINNED_INPUT_TENSORS: how long it takes to copy unpinned tensors to pinned memory.
PyRT_RUN: how long it takes to run the PEF on the RDU (excluding the Python frontend overhead)

Parameters:: filename – the file to export the profile results to

Examples:

# enabling the Samba Profiler with the context manager
>>> import sambaflow.samba as samba

>>> with samba.session.samba_profile('dump.txt'):
>>>     samba.session.run(input_tensors, output_tensors)

setup(args: BaseModel)¶: Setup session given a pydantic BaseModel FlattenSambaArgsAllowExtra or HydraSambaArgs

start_runtime_profile(timer_enabled: bool = True) → None¶

Turns on the profiler for SambaNova Runtime events and the cProfile Python profiler. The Runtime profiler profiles functions that are executed on the RDU. The profilers are enabled only if the environment variable ENABLE_RUNTIME_PERF is set to “SUMMARY” or “DETAILED”.

Parameters:: timer_enabled – whether to print the end-to-end wall time between the start and end of runtime profiling. Defaults to True.

start_samba_profile() → None¶: Turn on the profiler for the SambaFlow Python SDK frontend. See samba_profile().

classmethod sync_buffers_to_cpu(model: Module, inplace: bool = False) → None¶

Transfers model buffers from the RDU to the host.

Parameters:

model – the model
inplace – whether to reuse the CPU memory for the buffers when transferring data from the RDU. Defaults to False.

classmethod sync_buffers_to_rdu(model: Module) → None¶

Transfers model buffers from the host to the RDU.

Parameters:: model – the model

classmethod to_cpu(model: Module, optims: List[Optimizer] | None = None, inplace: bool = False) → None¶

Transfers model parameters, model buffers, and optimizer state tensors from the RDU to the host.

Parameters:

model – the model
optims – optimizers for the model. Defaults to None.
inplace – whether to reuse the CPU memory for the tensors when transferring from the RDU to the host. Defaults to False.

classmethod to_cpu_(model: Module, optims: List[Optimizer] | None = None) → None¶

Inplace version of to_cpu().

Parameters:

model – the model
optims – optimizers for the model. Defaults to None.

to_device() → None¶: Similar to PyTorch’s torch.Tensor.to() method. Sends all the Torch tensors cached during tracing to the RDU.

classmethod to_rdu(model: Module, optims: List[Optimizer] | None = None) → None¶

Transfers model parameters, model buffers, and optimizer state tensors from the host to the RDU.

Parameters:

model – the model
optims – optimizers for the model. Defaults to None.

update_tensor(name: str, tensor: SambaTensor | None) → None¶

Updates a traced SambaTensor with new values. Can also remove a SambaTensor from the dictionary of traced SambaTensors if tensor is set to None.

Parameters:

name – the name of the SambaTensor to update
tensor – the values to set. If None, removes the SambaTensor from the dictionary of traced SambaTensors

use_abexit(func: Callable, *args, **kwargs) → None¶

Runs the specified function with SambaFlow’s custom exit handling, which prevents hangs if a data parallel app encounters certain errors that would otherwise cause a deadlock. Running an app with use_abexit() bypasses exit handlers specified via the normal Python built-in atexit.register(). To prevent your exit handlers from being bypassed, use atexit_register() to register an exit handler that is run before the program exits.

Parameters:

func – the function to run the app. Often will be main().
*args – positional arguments for func
**kwargs – keyword arguments for func

static use_lazy_param() → bool¶: Whether or not SambaSession is in lazy parameter mode. See enable_lazy_param() for details.

static use_legacy_names(use: bool) → None¶

Whether to use legacy operation names for operations. Must be disabled for models using BatchNorm.

Parameters:: use – whether to use legacy operation names

use_pinned_memory() → bool¶: Returns True if the pinned memory API is used. See enable_pinned_memory().

use_static_functional() → bool¶: If True, SambaSession uses SambaFlow operators in a static/lazy fashion and generates symbolic tensors. If False, PyTorch operators are used to execute computations on the CPU. Static_functional is used during tracing or during lazy initialization.

property profiler¶: Handle to the Samba Profiler object (see samba_profile()).

property tracing: bool¶: Whether SambaSession is currently in tracing mode or not.