samba.session¶
- class SambaSession(pef_path: str = '')¶
A class for compiling and running applications with SambaFlow. Running a SambaNova model requires these steps:
First, before compiling and running, you have to convert your PyTorch model to a SambaFlow model. See the SambaFlow Conversion Tutorial and SambaNova PyTorch operator support for some background.
Then you compile the model. That means the compiler traces the model and generates a PEF file. The PEF is a binary executable file that contains the full details of the model and is used to run the model on SambaNova’s Reconfigurable Dataflow Unit (RDU). Not all PyTorch operators are currently supported. At compile time, the call to
SambaSession.compile()
traces the model to extract a computational graph and then compiles the graph into a PEF. A SambaNova computational graph is an optimized internal representation of the model which uses only operators supported on the RDU.Next, you run the model on the RDU by calling
sambaflow.samba.utils.utils.trace_graph()
and thenSambaSession.run()
.- trace_graph reads the PEF and loads its information, traces the model again to ensure that
the runtime computational graph matches the compile-time graph, and initializes the SambaNova Runtime backend.
run executes the computation and retrieves the output values from the RDU.
In the context of the SambaFlow frontend and the SambaSession class, the term “runtime” refers to the high-level frontend entry point into running an application with SambaFlow (after the compiler has generated a PEF). In contrast, the low-level SambaNova Runtime service communicates closely with the RDU hardware.
For details on the recommended workflow, see this link. For details on compiling a PEF file, see the Compilation overview.
- Parameters:
pef_path – path to the PEF. If nothing is provided, no PEF is loaded when the SambaSession class is initialized. The PEF can be loaded later when you run an application via
sambaflow.samba.utils.utils.trace_graph()
.
- add_graph(graph: SambaGraph, overwrite: bool = False)¶
New in version 1.18.
Adds a
SambaGraph
toSambaSession
’s collection of graph objects that will be compiled when you callcompile_multigraph()
. You can calladd_graph()
multiple times if there are multiple graphs to compile.SambaSession
stores the graphs in the order that you add them. When the PEF is compiled withcompile_multigraph()
, the graphs will be compiled in the order that they were stored withadd_graphs()
. You cannot have multiple graphs with the same name. Ifoverwrite
isTrue
and the incoming graph has the same name as a SambaGraph that was already added, the incoming graph replaces the old graph and is placed at the end of the graph ordering.- Parameters:
graph – the graph to add
overwrite – if
True
and an incoming SambaGraph has the same name as a SambaGraph that was already added, overwrite the older SambaGraph with the incoming SambaGraph. IfFalse
and an incoming SambaGraph has the same name as a previously added SambaGraph, raise an error.
- atexit_register(func: Callable, *args, **kwargs) None ¶
Registers an exit handler that is not bypassed if you run an app with
use_abexit()
. If not usinguse_abexit()
, runs with the normal Python built-inatexit.register()
.- Parameters:
func – the function to be executed at termination
*args – positional args for
func
**kwargs – keyword args for
func
- compile(model: torch.nn.Module, inputs: Tuple[SambaTensor, ...] | Dict[str, SambaTensor], optimizers: torch.optim.Optimizer | List[torch.optim.Optimizer] | None = None, name: str = '', loss_indices: List[Tuple[int, ...]] | None = None, io_host_memory: bool | None = None)¶
Traces and compiles a model and its optimizers into a PEF. Tracing is the process of traversing a model to extract the model’s computational graph in terms of operations and tensors, where the operations are the graph’s nodes and the tensors are the graph’s edges.
Tracing builds the computational graph with the operations that are supported on the RDU. After the model has been traced, the compiler optimizes the graph and maps it to the RDU. The compiler also generates a schedule of sections, which determines how how the model will be run. Each section is a portion of the graph and has an ID and a type. When you run the PEF on the RDU, the schedule is read and executed.
- Parameters:
models – The model that will be compiled into the PEF. The model can be a PyTorch model that has been modified to work on RDU.
inputs – The input(s) that are used to trace the provided models. These inputs should be exactly the tensors that the model expects in the
forward()
function. Inputs can also be provided in a dictionary if the model’sforward()
function accepts kwargs. In that case, the key of the dictionary is the keyword argument name and the value of the dictionary is the value of the keyword argument.optimizers – The optimizer(s) that are associated with the provided model. The model can have multiple optimizers or no optimizers. If the model has a single optimizer, specify the optimizer directly, for example,
optimizers = opt0
. If the model has multiple optimizers, specify the optimizers in a list, for example,optimizers = [opt0, opt1]
. If no optimizer is needed, specifyNone
. Defaults toNone
.name – A name used internally within the compiler. NOTE: This is not the name of the output PEF that the compiler generates.
init_output_grads – Deprecated.
loss_indices – The indices of the model outputs that are loss tensors. For example, if
outputs=[out0, out1, loss0, loss1]
, then specifyloss_indices=[2, 3]
. If not specified, the compiler assumes that all model output tensors are loss tensors and attempts backpropagation from all output tensors. Defaults toNone
.graph_names – Deprecated.
graph_transform_hook – Deprecated.
io_host_memory – Specify that every traced tensor should be read directly from host memory. Host memory is faster than the other memory options but there is limited capacity. Alternatively, memory types for individual tensors can be set with
sambatensor.mem_type = mem_type
. Memory types include DDR (DRAM), HBM (high bandwidth memory), Host (the CPU host for the RDU), and None (lets the compiler decide). Defaults toNone
.
Example:
# example of compiling a model that accepts one input tensor and has an SGD optimizer >>> import torch >>> import sambaflow.samba as samba >>> import sambaflow.samba.utils as utils >>> model = torch.nn.Linear(10, 10) >>> samba.from_torch_model_(model) >>> torch_input = torch.randn(5, 10) >>> ipt = samba.from_torch_tensor(torch_input, name='input') >>> optim = samba.optim.SGD(model.parameters(), lr=0.1) >>> samba.session.compile(model, (ipt, ), optim)
- compile_multigraph(name: str = '', samba_only: bool = False, **kwarg) str | None ¶
New in version 1.18.
The multigraph compile API. Compiles graphs specified by
add_graph()
into a PEF.- Parameters:
name – name of the model
- disable_graphamp()¶
Context manager for selectively disabling graphamp
- static disable_lazy_param() None ¶
Disables lazy parameter mode and enables normal eager module initialization for the lifetime of this application. See
enable_lazy_param()
.
- disable_pinned_memory() None ¶
Turns off pinned memory. See
enable_pinned_memory()
.
- enable_lazy_param(seed: int | None = None, init_threads: int = 8, inverse_transform_sampling: bool = False, fp32_to_bf16_rounding: str = 'nearest_even') None ¶
Enables lazy parameter mode. Lazy parameter mode either skips parameter initialization or initializes only placeholders for parameters. That approach can be helpful for compiling and running large models when host memory is limited. At compile time, enabling lazy parameter mode does not initialize parameters because tracing requires only metadata such as tensor shapes and tensor dtypes. At runtime, enabling lazy parameter mode replaces the parameters with placeholders and factory functions, preserving the information required to produce the real tensor without actually initializing the tensor values. When the tensors are needed at runtime, internal factory functions materialize the tensor by populating it with numeric values.
- Parameters:
seed – random seed used when materializing random values that were lazily instantiated. Guarantees reproducibility regardless of the current random state of the program. Defaults to
None
.init_threads – optional number of threads for multithreaded random number generation for randomly initialized tensors. Defaults to
8
.inverse_transform_sampling – if true, uses inverse transform sampling for normal random number generation. Defaults to
False
.fp32_to_bf16_rounding – how to cast randomly-generated values from float32 to bfloat16, either
'nearest_even'
or'truncation'
.'nearest_even'
: cast tensor to bfloat16 through PyTorch’s tensor.bfloat16.'truncation'
: because NumPy does not natively support bfloat16,'truncation'
reinterprets the values as a uint16. The most common usage is'nearest_even'
.'truncation'
via NumPy scales better with multithreading. Defaults to'nearest_even'
.
- enable_pinned_memory() None ¶
Enables pinned memory on the host.
Pinned memory can be used only with SambaLoaders and offers a performance boost in transferring data to and from the RDU.
By default, all user inputs and output gradients are eligible for pinned memory.
Pinned memory is a runtime optimization that reserves a chunk of nonpageable memory for tensors that are transferred to and from the RDU to reduce the amount of transfers that take place on the host. Normally, tensors are placed in pageable memory. When those tensors are moved to the RDU, they need to be copied to a staging area of nonpageable memory before they can be transferred to the RDU. With pinned memory enabled, these tensors are allocated directly in the region of nonpageable pinned memory and they do not need to be copied from pageable memory to nonpageable memory on the host.
NOTE: You do not need to set the
fast_access
attribute for tensors that are fed into a SambaLoader because the SambaLoader assumes those tensors are all pinnable.
- end_runtime_profile(log_file: str = 'cprofile_tmp.log') None ¶
Turns off SambaNova Runtime profiling and Python cProfile, and dumps the profile results to a file. See
start_runtime_profile()
.- Parameters:
log_file – log file to dump the results of the profiling to
- end_samba_profile(filename: str | None = None) None ¶
Turns off the Samba profiler and dump the profile results to a file. See
samba_profile()
.- Parameters:
filename – the file to dump the profile results to
- get_argin_names() OrderedSet ¶
Gets the names of the argins from the PEF. The argins are the hyperparameters that you provide to
run()
to control the RDU’s behavior at runtime.
- get_dropout_rate_argin_names() List[str] ¶
Gets the names of the argins related to dropout rate from the PEF. These argins are the hyperparameters that you provide to
run()
to control the dropout rate of different layers in the model.
- get_dropout_seed_argin_names() List[str] ¶
Gets the names of the argins related to the dropout seed from the PEF. These argins are the hyperparameters that can be provided to the RDU at runtime to set the dropout seeds and control dropout randomness.
- get_samba_tensor_by_name(name: str) SambaTensor ¶
Gets a traced
SambaTensor
by name. This function does not retrieve tensor values from the RDU (seeget_tensors_by_name()
instead).- Parameters:
name – the name of the SambaTensor to get
- get_section_types() Set[str] ¶
Gets the section types that were compiled in the PEF. See
run()
for a list of supported section types.
- get_tensors(samba_tensors: List[SambaTensor]) Tuple[SambaTensor, ...] ¶
Get
SambaTensor
values from RDU memory given the mirrored SambaTensors on the host.- Parameters:
samba_tensors – the mirrored SambaTensors on host
- get_tensors_by_name(names: List[str]) Tuple[SambaTensor, ...] ¶
Gets
SambaTensor
values from RDU memory given the names of the tensors.- Parameters:
names – the names of the SambaTensors to get from RDU memory
- get_weight_names() List[str] ¶
Gets the names of all weight symbols in the PEF.
- Returns:
List of names of each weight
- init_multigraph_runtime(self, pef: str, transfer_device: bool = True) None ¶
New in version 1.18.
Sets the PEF, initializes the SambaFlow runtime backend, and transfers the traced tensors to the device. Call this function before running a PEF with multigraph.
- Parameters:
pef – path to the PEF
transfer_device – whether to transfer traced tensors to the device
- static reset(pef_path: str = '') None ¶
Deconstructor for
SambaSession
. Initializes a newSambaSession
instance with the specified PEF path.- Parameters:
pef_path – the path to the PEF that will be used to initialize
SambaSession
. Defaults to the empty string (‘’).
- reset_random_generator() None ¶
Resets the random number generators.
- run(input_tensors: Tuple[SambaTensor, ...] | List[SambaTensor] = None, output_tensors: Tuple[SambaTensor, ...] | List[SambaTensor] = [], hyperparam_dict: Dict[str, int | float] = {}, data_parallel: bool = False, reduce_on_rdu: bool = False, section_ids: List[int | List[int]] = [], section_types: List[str] = [], run_until_call_id: int | None = None, run_from_call_id: int | None = None, data_parallel_mode: str = 'normal', train: bool = True)¶
Runs a PEF on the RDU. After the model has been compiled, you call this function to train the model or perform inference. The function retrieves the results from the RDU and returns them. Note that
trace_graph()
needs to be called just once before the call torun()
or any other subsequentrun()
calls.By default, runtime execution follows the PEF schedule that the compiler created and wrote to the PEF. You can use
section_ids
orsection_types
to control which sections to run. For example, to run only the forward pass, specifysection_types=['FWD']
. See thesection_types
parameter below.- Parameters:
input_tensors – User-provided inputs to be transferred from host to RDU so that the RDU can pass the inputs through the model. Use this parameter to pass in input SambaTensors that have not been transferred to the RDU yet or have new values since the previous
run()
call. TheSambaTensor.sn_name
attribute of each input tensor should match the attribute of the corresponding tensor that was traced at compile time. Initially, input PyTorch tensors or SambaTensors in an application are instantiated in host memory and the data is not available on the RDU. Specifying the input tensors in this parameter triggers the transfer of input values to the RDU. If no tensors are specified, no tensor values are transferred to the RDU, so SambaFlow assumes that the input tensor values were transferred to the RDU in previousrun()
calls and performs computation using the values that the RDU has for the input tensors. Defaults toNone
.output_tensors – Output tensors to evaluate that are retrieved from the RDU. These tensors can be just a subset of the output tensors of the model if certain output tensors are not needed on the host. Note that after calling
trace_graph()
, the model’s output tensors can be accessed viamodel.output_tensors
. Defaults to an empty list ([]).hyperparam_dict – Hyperparameters that control the RDU’s behavior at runtime. Hyperparameters that have already been provided in a previous
run()
call do not need to be re-specified because previously provided hyperparameters will be maintained. See the Hyperparameter reference for details on the supported hyperparameters. Defaults to {}.data_parallel – parallelize training across multiple RDUs, including any necessary gradient synchronization. In data parallel (DP) mode, the model is replicated on multiple RDUs and the batch is split amongst the replicas. Defaults to
False
.reduce_on_rdu – In data parallel mode, whether to reduce the gradients on the RDU or on the host. Defaults to
False
.section_ids – Section IDs and order to run them in. If not specified, runs the full PEF schedule. Specify multiple section IDs to run them in sequence. The section IDs that are in the PEF can be found in the
.pef.log
file that was generated at compile time. You cannot specify bothsection_types
andsection_ids
. Defaults to the empty list ([]).section_types –
Section types and the order to run them in. If not specified, runs the full PEF schedule. Specifying a section type here runs all sections of that type. Specify multiple section types to run them in sequence. The section types can be found in the
.pef.log
file that was generated at compile time. Not all sections are present in every app. You cannot specify bothsection_types
andsection_ids
. Section types are:'ZEROGRAD'
: sets the model parameter gradients to 0. Note that an explicit ZEROGRAD section appears only in applications with gradient accumulation enabled. In applications without gradient accumulation, the ZEROGRAD section is not needed because the gradient computation overwrites the gradient region in memory.'FWD'
: performs a forward pass over the model, computing the intermediate activations.'BCKWD'
: performs a backward pass over the model, computing the gradients of the model parameters.'OPT'
: performs the optimizer step and updates the model parameters.'REDUCE'
: is used in data parallel scenarios to combine the gradients across workers.'GRADNORM'
: normalizes and balances the gradients. Useful in multitask applications.
Defaults to the empty list ([]).
run_until_call_id – The call to
run
executes the schedule in order until reaching the ID you specify with this parameter.run_from_call_id – The call to
run
executes the schedule beginning at the ID you specify with this parameter.graph_name – In development.
data_parallel_mode –
specify which data parallel method to use. The methods are:
'normal'
: executes the PEF schedule in order.'inorder'
: runs with gradient synchronization overlap, where gradient synchronization across replicas can occur in parallel. The device runs sections in the order of the schedule in the PEF.'optimal'
: runs with gradient synchronization overlap, where gradient synchronization across replicas can occur in parallel. The device can perform optimizer updates and gradient normalization computation sections out of order for better performance.
Defaults to
'normal'
.train – specifies whether running in training mode or evaluation mode. Currently used only to control the rates of the dropout layers in the model. If in training mode, the dropout rates provided in
hyperparam_dict
are honored. If in evaluation mode, dropout rate values are automatically converted to 0 and dropout rates are restored when switching back to training mode. Defaults toTrue
.
Example:
# example run of entire PEF schedule >>> import torch.nn as nn >>> import sambaflow.samba as samba >>> import sambaflow.samba.utils as utils >>> model = nn.BiLinear(10, 10) >>> samba.from_torch_model_(model) >>> optim = samba.optim.SGD(model.parameters(), lr=0.1) >>> ipt = samba.randn(5, 10, name='input') >>> utils.trace_graph(model, (ipt, ), optim, init_output_grads=True, pef=path_to_pef) >>> samba_outputs = samba.session.run(model, (ipt, ), model.output_tensors) # example run of only the forward pass >>> samba_outputs = samba.session.run(model, (ipt, ), model.output_tensors, section_types=["FWD"])
- samba_profile(filename: str | None = None)¶
Context manager for using the profiler for the SambaFlow Python SDK frontend when running an application. The Samba profiler measures how long operations in the frontend take on the host. By comparison, the runtime profiler (see
start_runtime_profile()
) measures how long operations take on the RDU. Samba profiler events include:SAMBA_SESSION_RUN_[SECTION_TYPES]: if calling
run()
with parametersection_types
, how long it takessamba.session.run()
to run.SAMBA_SET_PEF: how long it takes to parse the PEF. The PEF is parsed when
SambaSession
is initialized, so the Samba profiler will need to be enabled before then. Note that typically the globalSambaSession
object is initialized whensambaflow.samba
is imported.SAMBA_INIT_RUNTIME: how long it takes to initialize the SambaNova Runtime backend.
PyRT_GET_TENSORS: how long it takes to perform a function call that retrieves tensors from the RDU
PyRT_GET_TENSORS_GATHER: how long it takes to perform a function call that retrieves tensors from all RDUs. This event will only occur if running in distributed learning mode, which is enabled by calling
sambaflow.samba.utils.trace_graph()
with thedistlearn_config
argument during runtime.PyRT_SET_TENSORS: how long it takes to perform a function call that sends tensors from the host to the RDU
PyRT_SET_TENSORS_BROADCAST: in distributed learning mode, how long it takes to perform a function call that sends tensors from the root rank to all other RDUs
PyRT_SET_PINNED_INPUT_TENSORS: how long it takes to copy unpinned tensors to pinned memory.
PyRT_RUN: how long it takes to run the PEF on the RDU (excluding the Python frontend overhead)
- Parameters:
filename – the file to export the profile results to
Examples:
# enabling the Samba Profiler with the context manager >>> import sambaflow.samba as samba >>> with samba.session.samba_profile('dump.txt'): >>> samba.session.run(input_tensors, output_tensors)
- setup(args: BaseModel)¶
Setup session given a pydantic BaseModel FlattenSambaArgsAllowExtra or HydraSambaArgs
- start_runtime_profile(timer_enabled: bool = True) None ¶
Turns on the profiler for SambaNova Runtime events and the cProfile Python profiler. The Runtime profiler profiles functions that are executed on the RDU. The profilers are enabled only if the environment variable
ENABLE_RUNTIME_PERF
is set to “SUMMARY” or “DETAILED”.- Parameters:
timer_enabled – whether to print the end-to-end wall time between the start and end of runtime profiling. Defaults to True.
- start_samba_profile() None ¶
Turn on the profiler for the SambaFlow Python SDK frontend. See
samba_profile()
.
- classmethod sync_buffers_to_cpu(model: Module, inplace: bool = False) None ¶
Transfers model buffers from the RDU to the host.
- Parameters:
model – the model
inplace – whether to reuse the CPU memory for the buffers when transferring data from the RDU. Defaults to False.
- classmethod sync_buffers_to_rdu(model: Module) None ¶
Transfers model buffers from the host to the RDU.
- Parameters:
model – the model
- classmethod to_cpu(model: Module, optims: List[Optimizer] | None = None, inplace: bool = False) None ¶
Transfers model parameters, model buffers, and optimizer state tensors from the RDU to the host.
- Parameters:
model – the model
optims – optimizers for the model. Defaults to None.
inplace – whether to reuse the CPU memory for the tensors when transferring from the RDU to the host. Defaults to False.
- classmethod to_cpu_(model: Module, optims: List[Optimizer] | None = None) None ¶
Inplace version of
to_cpu()
.- Parameters:
model – the model
optims – optimizers for the model. Defaults to None.
- to_device() None ¶
Similar to PyTorch’s
torch.Tensor.to()
method. Sends all the Torch tensors cached during tracing to the RDU.
- classmethod to_rdu(model: Module, optims: List[Optimizer] | None = None) None ¶
Transfers model parameters, model buffers, and optimizer state tensors from the host to the RDU.
- Parameters:
model – the model
optims – optimizers for the model. Defaults to None.
- update_tensor(name: str, tensor: SambaTensor | None) None ¶
Updates a traced
SambaTensor
with new values. Can also remove a SambaTensor from the dictionary of traced SambaTensors iftensor
is set toNone
.- Parameters:
name – the name of the SambaTensor to update
tensor – the values to set. If
None
, removes the SambaTensor from the dictionary of traced SambaTensors
- use_abexit(func: Callable, *args, **kwargs) None ¶
Runs the specified function with SambaFlow’s custom exit handling, which prevents hangs if a data parallel app encounters certain errors that would otherwise cause a deadlock. Running an app with
use_abexit()
bypasses exit handlers specified via the normal Python built-inatexit.register()
. To prevent your exit handlers from being bypassed, useatexit_register()
to register an exit handler that is run before the program exits.- Parameters:
func – the function to run the app. Often will be main().
*args – positional arguments for
func
**kwargs – keyword arguments for
func
- static use_lazy_param() bool ¶
Whether or not SambaSession is in lazy parameter mode. See
enable_lazy_param()
for details.
- static use_legacy_names(use: bool) None ¶
Whether to use legacy operation names for operations. Must be disabled for models using BatchNorm.
- Parameters:
use – whether to use legacy operation names
- use_pinned_memory() bool ¶
Returns
True
if the pinned memory API is used. Seeenable_pinned_memory()
.
- use_static_functional() bool ¶
If
True
,SambaSession
uses SambaFlow operators in a static/lazy fashion and generates symbolic tensors. IfFalse
, PyTorch operators are used to execute computations on the CPU. Static_functional is used during tracing or during lazy initialization.
- property profiler¶
Handle to the Samba Profiler object (see
samba_profile()
).
- property tracing: bool¶
Whether
SambaSession
is currently in tracing mode or not.