samba.utils

init_output_grads

init_output_grads(output_tensors: List[str], unspecified_tensor_grad: List[str] = [], customized_tensors_dict: Dict[str, Tensor | SambaTensor] = {}, is_variable_batch_size=False) None

Initializes tensor gradient values to those provided by the user. If values are unspecified, initializes to a tensor of ones.

Parameters:
  • output_tensors – Output SambaTensors, usually all of the tensors from the model

  • unspecified_tensor_grad – List of tensor names whose grad values are unspecified but need to be initialized

  • customized_tensors_dict – Dictionary that maps tensor names to the corresponding gradient to initialize for initialization by the user

  • is_variable_batch_size – If the grad is variable batch size

utils

trace_graph(model: torch.nn.Module | onnx.onnx_ml_pb2.ModelProto, inputs: Tuple[SambaTensor, ...] | List[SambaTensor], optim: torch.optim.Optimizer | List[torch.optim.Optimizer] | None = None, init_output_grads: bool = True, loss_indices: List[Tuple[int]] | None = None, pef: str = '', mapping: str = 'section', data_parallel_mode: str = 'normal', transfer_device: bool = True)

Traces a graph when running an app to initialize the model weights and input/output tensors on the device.

Parameters:
  • model – the model to be traced

  • inputs – a list of input tensors to the model

  • optim – if set, initializes the state of all optimizers on the device

  • init_output_grads – if True, initializes output gradients on device

  • loss_indices – The indices of the model outputs that are loss tensors. For example, if outputs=[out0, out1, loss0, loss1], then specify loss_indices=[2, 3]. If not specified, the compiler assumes that all model output tensors are loss tensors and attempts backpropagation from all output tensors. Defaults to None.

  • pef – path to the compiled PEF file

  • mapping

    the graph mapping method to use. The methods are:

    • 'section': the normal mapping method where each section is individually mapped onto the RDU in order to run large models that cannot fit onto one RDU. This mapping mode is used for most apps.

    • 'spatial': an extreme case of model parallel where both forward and backward graphs are mapped onto the RDU at the same time. Section swapping is not required and we can exploit the fast SRAM on-chip bandwidth. However, this mapping mode is only feasible for small models.

    Defaults to 'section'.

  • data_parallel_mode

    the data parallel method to use. Should be what is passed to SambaSession.run(). Supported methods are:

    • 'normal': executes the schedule specified in the PEF in order.

    • 'inorder': runs with gradient synchronization overlap, where gradient synchronization across replicas can occur in parallel. The device runs sections in the order of the schedule in the PEF.

    • 'optimal': runs with gradient synchronization overlap, where gradient synchronization across replicas can occur in parallel. The device can perform optimizer updates and gradient normalization computations sections out-of-order for better performance.

    Defaults to 'normal'.

  • transfer_device – if True, transfers initialized tensors from host memory to the device. Otherwise, tensors are not automatically transferred to the device, so the user has more flexibility in controlling which tensors are transferred and when they are transferred. Defaults to True.

Returns:

The traced output tensors. The traced output tensors can also be accessed via model.output_tensors.

trace_multigraph(graph: Module, inputs: SambaTensor | List[SambaTensor], optimizers: List[Optimizer] | None = None, init_output_grads: bool | None = None, mapping: str = '', trace_prefix: str | None = None) Tuple[SambaTensor, ...]

New in version 1.18.

The function for tracing graphs in the multigraph feature. Traces a graph when running an app to initialize the graph’s weights and input/output tensors on device. Call trace_multigraph() only once for each graph. Returns a handle to the output tensors of graph.

Parameters:
  • graph – the graph to trace.

  • inputs – the graph’s input tensors.

  • optimizers – graph’s optimizers. Defaults to None.

  • init_output_grads – whether to initialize output gradients. Defaults to None.

  • mapping – the graph mapping method to use. See trace_graph() for details. Defaults to “”.

  • trace_prefix – append a prefix to SambaFlow Operation names. This is useful to distinguish which SambaGraph a particular operator belongs to.