
SambaTensor Class

class SambaTensor(torch_tensor: Tensor | None = None, shape: Iterable[int] | None = None, dtype: dtype | None = None, name: str | None = None, batch_dim: int | None = None, named_dims: Iterable[str | None] | None = None, sized_dims: Iterable[str | None] | None = None, materializer: Callable[[SambaTensor, MultithreadedRNG], Tensor] | None = None, is_complex: bool | None = None, region_name: str | None = None)

The SambaTensor is the base tensor data structure for SambaFlow. It wraps torch.Tensor and adds custom data members and methods to support graph tracing and interfacing with the device. You must use SambaTensor when you are running a model on the RDU (the device).

Any application that runs on RDU must use SambaTensor, which supports:

  • Static tracing, which saves memory and compute resources

  • Getting and setting device memory

A SambaTensor can be constructed from a torch.Tensor:

  • using samba.from_torch_tensor(torch_tensor) (similar to constructing a torch.Tensor from a numpy.ndarray with torch.from_numpy())

  • directly with the SambaTensor constructor, that is, SambaTensor(torch_tensor).

With either construction method, the new SambaTensor and the original PyTorch tensor share the same memory, so any change to the original PyTorch tensor is reflected in the new SambaTensor. This is different from torch.Tensor(np.ndarray) which copies the data.

A SambaTensor can be empty to use less memory. You can construct an empty SambaTensor with an empty PyTorch tensor using the methods listed above or with the shape and dtype parameters. An empty SambaTensor is especially helpful for graph tracing where you only need tensor shapes and dtypes. An entire model can be instantiated with empty SambaTensors by using lazy parameters with samba.lazy_param. When an empty SambaTensor is used on the RDU, the SambaTensor’s materializer is used to initialize the data.

Use SambaTensor.torch() to retrieve the original PyTorch tensor, similar to torch.Tensor.numpy() which returns the original numpy.ndarray.

SambaTensors can be used on the host CPU just like PyTorch tensors, though the supported methods are limited to the functions in samba.functional.

Accessing device-side weight and gradient data

The SambaTensor provides APIs to directly access tensor data on RDU device memory. For example,

# samba_tensor.sn_data and samba_tensor.sn_grad copy data
# from the device to host memory PyTorch tensors
print('data on device memory:', sambatensor.sn_data)
print('gradient data on device memory if it exists:', sambatensor.sn_grad)

# Data copy happens anytime sn_data and sn_grad are accessed
sn_weight = weight.sn_data
sn_grad = weight.sn_grad

Modifying device memory weight and grad data

The SambaTensor provides APIs to modify tensor data on RDU device memory.

samba_tensor.sn_data = tensor  # as long as isinstance(tensor, torch.Tensor), like torch.Tensor or SambaTensor


# transfer the host data to the device

You can assign a PyTorch tensor or SambaTensor to sn_data, which will copy the data to the tensor on the device. Similarly, you can assign a PyTorch tensor or SambaTensor to sn_grad, which will copy the data to the tensor’s gradient on the device.

# Weight and its gradient are updated on host and then copied to the device
weight.sn_data = sn_weight / torch.norm(sn_weight)  # weight normalization on host
weight.sn_grad = sn_grad / torch.norm(sn_grad)  # grad normalization on host

Alternatively, we can use rdu() and cpu() to synchronize the data between host memory (on CPU) and device memory (on RDU) of a SambaTensor, e.g.:

# Modify weights on the host only, weights on the device will
# remain unchanged = weight / torch.norm(weight)
weight.grad = weight / torch.norm(weight.grad)

# Print device-side weights before synchronizing host-device memory

# Copy host memory to device memory
# Note: both data and grad will be synchronized

# Print device-side weights after device-to-host copy

# Modify weight grad on device directly from the host
weight.sn_grad = torch.zeros_like(sn_grad)

# Print host-side weight gradients before synchronizing host-device memory

# Copy device memory to host memory

# Print host-side weight gradients after device-to-host copy

The sn_data and sn_grad members of the SambaTensor class are Python data descriptors with custom setter and getter methods. When you access sn_data and sn_grad from a SambaTensor, they return a torch.Tensor to represent the data on device memory. Any modification to this returned torch.Tensor is not reflected in RDU memory.


Tensor manipulation on the host is expensive because the computations are performed by the CPU and data synchronization between host and device is bandwidth-heavy. Do not use these four SambaTensor APIs (sn_data, sn_grad, rdu() and cpu()) unless necessary, e.g. when checkpointing models.

SambaTensor has similar methods and attributes as torch.Tensor. In addition, SambaTensor has methods and members that are specific to the RDU dataflow architecture.

In instances where an operation involves input SambaTensors of different data types, SambaFlow will follow the dtype promotion rules that PyTorch uses to do the computation (see information on Promotion in for details). For example, when calling samba.add with one input of dtype bfloat16 and the other input of dtype float32, the bfloat16 SambaTensor will be promoted to float32.

  • torch_tensor – A torch.Tensor object used to construct the SambaTensor.

  • shape – Shape of the tensor, used to implement tracing. Cannot be specified with torch_tensor.

  • dtype – Data type of the tensor. Should be a torch.dtype object. Cannot be specified with torch_tensor.

  • name – User-provided name for the SambaTensor, similar to tf.Placeholder.

  • batch_dim – Deprecated.

  • named_dims – Deprecated.

  • sized_dims – Experimental. This argument is for a feature in development.

  • materializer – Function to initialize this tensor with values when transferring this tensor to the RDU. Only applicable if this tensor was lazily initialized (see samba.session.enable_lazy_param). The function should accept parameters shape, dtype, and requires_grad and return a torch.Tensor. The materializer does not accept a torch.Tensor.

  • is_complex – Experimental. Whether this tensor represents a complex tensor or not.

  • region_name – Name for tensor’s location in memory. See sn_region_name.


>>> import torch
>>> import sambaflow.samba as samba
>>> # Initialize SambaTensor with constructor
>>> torch_tensor = torch.Tensor([1, 2])
>>> samba_tensor0 = samba.SambaTensor(torch_tensor)
>>> # Initialize SambaTensor with samba.from_torch_tensor
>>> samba_tensor1 = samba.from_torch_tensor(torch_tensor, name="samba_tensor1")
>>> # 3 ways to initialize empty SambaTensor with shape (2,3)
>>> empty_samba_tensor0 = samba.SambaTensor(torch.empty(2, 3), name="empty_samba_tensor0")
>>> empty_samba_tensor1 = samba.SambaTensor(shape=(2,3), dtype=torch.bfloat16, name="empty_samba_tensor1")
>>> empty_samba_tensor2 = samba.from_torch_tensor(torch.empty(2, 3), name="empty_samba_tensor2")
__getitem__(x: int | slice | None | SambaTensor | Tuple[int | slice | None | SambaTensor]) SambaTensor

Indexes this SambaTensor. This function can be called with the [] operator.

Currently supported index types are:

  • an integer, to retrieve a single element along that dimension.

  • a slice, to retrieve some subset of elements along that dimension.

  • None, to indicate that the tensor should be unsqueezed at that index.

  • a SambaTensor, to gather indices indicated by the tensor. SambaTensor currently does not support indexing with multidimensional SambaTensors or multiple SambaTensors.

  • a list, to retrieve some elements by indices along that dimension.


x – the index object


>>> samba_tensor = samba.randn(2, 3)
tensor([[ 0.7102, -0.8594, -0.5047],
        [ 0.8140, -0.4194,  1.5488]])
>>> samba_tensor[:, 2].data
tensor([-0.5047,  1.5488])

>>> index_tensor = samba.SambaTensor(torch.Tensor([0, 2]))
>>> samba_tensor[None, :, index_tensor].data
tensor([[[ 0.7102, -0.5047],
        [ 0.8140,  1.5488]]])
__setitem__(x: int | slice | None | Tuple[int | slice | None], update: int | float | SambaTensor)

Indexes this SambaTensor and sets the data. This function can be called with the [] operator.

Currently supported index types are:

  • an integer, to retrieve a single element along that dimension.

  • a slice, to retrieve some subset of elements along that dimension.

  • None, to indicate that the tensor should be unsqueezed at that index.

  • a SambaTensor, to gather indices indicated by the tensor. SambaTensor currently does not support indexing with multidimensional SambaTensors or multiple SambaTensors.


x – the index object


>>> samba_tensor = samba.zeros(2,3)
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> samba_tensor[:, 2] = samba.ones(2)
tensor([[0., 0., 1.],
        [0., 0., 1.]])

>>> index_tensor = samba.SambaTensor(torch.Tensor([0, 2]))
>>> samba_tensor[None, :, index_tensor] = 2 * samba.ones(1, 2, 2)
tensor([[2., 0., 2.],
        [2., 0., 2.]])
backward(gradient: SambaTensor | Tensor | None = None, retain_graph: bool | None = None) None

Calls torch.Tensor.backward() on the underlying PyTorch tensor and computes the gradient of the PyTorch tensor with respect to the graph leaves.

The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, backward() also requires specifying the gradient. gradient should be a tensor of the same type and location as self that contains the gradient of the differentiated function w.r.t. self.

This function accumulates gradients in the leaves - you might need to zero .grad attributes or set them to None before calling it. See Default gradient layouts for details on the memory layout of accumulated gradients.

  • gradient – Gradient with respect to the tensor. If gradient is a tensor, it is automatically converted to a tensor that does not require a gradient. None values can be specified if self is a scalar tensor or a tensor that doesn’t require a gradient. If a None value is acceptable, then this argument is optional. Defaults to None.

  • retain_graph – If False, the graph used to compute the grads is freed. In nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to None.

bfloat16() SambaTensor

self.bfloat16 is equivalent to self.type(torch.bfloat16). See type().

bool() SambaTensor

self.bool is equivalent to self.type(torch.bool). See type().

clear_data() None

Clear the tensor data on the host.

cpu(inplace: bool = False) None

Copy the data from device memory to the host. Avoid using cpu() because this operation is bandwidth intensive.


inplace – whether to modify the underlying host memory in-place. Defaults to False.

cpu_() None

Same as cpu() except the underlying host memory is modified in-place.

data_ptr() int

Returns the address of the first element of the associated PyTorch tensor.

dim() Size | int

Returns the number of dimensions of self tensor.

element_size() int

Returns the element size in bytes

float() SambaTensor

self.float is equivalent to self.type(torch.float). See type().

int() SambaTensor is equivalent to self.type( See type().

static is_fast_access(name: str) bool

Returns True if the SambaTensor with sn_name name is a fast access tensor. Returns False otherwise. See fast_access for details.

is_floating_point() bool

Returns True if self is a floating-point tensor, otherwise returns False.

item() float | int

Returns the value of this tensor as a standard Python number. This only works for tensors with one element. This operation is not differentiable.


>>> x = samba.SambaTensor(torch.tensor([1.0]))
>>> x.item()
long() SambaTensor

self.long is equivalent to self.type(torch.long). See type().

materialize_() None

If the tensor does not have data, materializes the tensor. Otherwise, does nothing.

ndimension() Size | int

Alias for dim().

nelement() int

Alias for numel()

new_empty(size: int | Iterable[int], dtype: dtype | None = None, requires_grad: bool = False) SambaTensor

Returns a SambaTensor of size size filled with uninitialized data. By default, the returned SambaTensor has the same torch.dtype as this tensor.

  • size – a list, tuple, or torch.Size of integers defining the shape of the output tensor.

  • dtype – the desired type of the returned tensor. If None, same torch.dtype as this SambaTensor.

  • requires_grad – if autograd should record operations on the returned tensor. Defaults to False.


>>> sambatensor = samba.ones((), dtype=torch.float32)
>>> sambatensor.new_empty((2, 3)).data
tensor([[0.0000e+00, 1.4405e-41, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])

See also

See torch.Tensor.new_empty().

new_full(size: Tuple[int], fill_value: int | float, dtype: dtype | None = None, device: device | None = None, requires_grad: bool | None = False) SambaTensor

Returns a SambaTensor of size size filled with fill_value. By default, the returned SambaTensor has the same torch.dtype as this tensor.

  • size – a list, tuple, or torch.Size of integers defining the shape of the output tensor.

  • fill_value – the number to fill the output tensor with.

  • dtype – the desired type of the returned tensor. If None, same torch.dtype as this SambaTensor.

  • device – the desired device of the returned tensor. If None, same torch.device as this tensor.

  • requires_grad – if autograd should record operations on the returned tensor. Defaults to False.


The PyTorch API optional keyword arg

  • device

is not supported on RDU and has no effect.


>>> sambatensor = samba.ones((), dtype=torch.float32)
>>> sambatensor.new_full((2, 3), 5.0).data
tensor([[5., 5., 5.],
        [5., 5., 5.]])
>>> # new_full with explicit data type
>>> sambatensor = samba.ones((), dtype=torch.float32)
>>> sambatensor.new_full((2, 3), 5.0, dtype=torch.bfloat16).data
tensor([[5., 5., 5.],
        [5., 5., 5.]], dtype=torch.bfloat16)

See also

See torch.Tensor.new_full().

new_ones(size: int | Iterable[int], dtype: dtype | None = None, requires_grad: bool = False) SambaTensor

Returns a SambaTensor of size size filled with 1. By default, the returned SambaTensor has the same torch.dtype as this tensor.

  • size – a list, tuple, or torch.Size of integers defining the shape of the output tensor.

  • dtype – the desired type of the returned tensor. If None, same torch.dtype as this SambaTensor.

  • requires_grad – if autograd should record operations on the returned tensor. Defaults to False.


>>> sambatensor = samba.randn((), dtype=torch.bfloat16)
>>> sambatensor.new_ones((2, 3)).data
tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.bfloat16)

See also

See torch.Tensor.new_ones().

new_zeros(size: int | Iterable[int], dtype: dtype | None = None, requires_grad: bool = False) SambaTensor

Returns a SambaTensor of size size filled with 0. By default, the returned SambaTensor has the same torch.dtype as this tensor.

  • size – a list, tuple, or torch.Size of integers defining the shape of the output tensor.

  • dtype – the desired type of the returned tensor. If None, same torch.dtype as this SambaTensor.

  • requires_grad – if autograd should record operations on the returned tensor. Defaults to False.


>>> sambatensor = samba.randn((), dtype=torch.bfloat16)
>>> sambatensor.new_zeros((2, 3)).data
tensor([[0., 0., 0.],
        [0., 0., 0.]], dtype=torch.bfloat16)

See also

See torch.Tensor.new_zeros().

numel() int

Returns the number of elements.


See permute.

rdu() None

Synchronizes the host memory of the tensor (and its gradient if it exists) to its device memory. Similar to an in-place version of torch.Tensor.cuda(). Avoid using rdu() because this operation is bandwidth intensive.

requires_grad_(requires_grad: bool = True) None

Change if autograd should record operations on this tensor by setting this tensor’s requires_grad attribute in-place. Returns this tensor.

requires_grad_()’s main use case is to tell autograd to begin recording operations on a SambaTensor (tensor). If tensor has requires_grad=False (because it was obtained through a DataLoader, or required preprocessing or initialization), tensor.requires_grad_() causes autograd to record operations on tensor.


requires_grad – If autograd should record operations on this tensor. Default: True.


See reshape.

reusable() bool

Returns True if the tensor memory can be reused for host-to-device data transfers.


reusable() assumes that the host PyTorch tensor’s NumPy array is contiguous

short() SambaTensor

self.short is equivalent to self.type(torch.short). See type().

size(dim: int | None = None) Size | int

Returns the size of the self tensor. If dim is not specified, the returned value is a torch.Size, a subclass of tuple. If dim is specified, returns an int holding the size of that dimension.


dim – the dimension for which to retrieve the size. Defaults to None.


>>> t = samba.empty(3, 4, 5)
>>> t.size()
torch.Size([3, 4, 5])
>>> t.size(dim=1)
stride(dim: int | None = None) int | Tuple[int]

Returns the stride of tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension dim. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension dim.

See also

See torch.stride().

to(*args, **kwargs) SambaTensor

Performs Tensor dtype conversion. A dtype is inferred from the arguments of*args, **kwargs).

New in version 1.18.

Here are the ways to call to:

to(dtype) SambaTensor

Returns a SambaTensor with the specified dtype.

to(other) SambaTensor

Returns a SambaTensor with the same torch.dtype as the SambaTensor other.


>>> samba.set_seed(1)
>>> sambatensor = samba.randn(2, 2) # Initially dtype=float32
tensor([[0.6602, 0.2676],
        [0.0618, 0.6211]], dtype=torch.bfloat16)
>>> other_torch = torch.randn((), dtype=torch.float64)
tensor([[0.6614, 0.2669],
        [0.0617, 0.6213]], dtype=torch.float64)

See also


torch() Tensor

Returns the SambaTensor’s underlying torch.Tensor. This method is the equivalent of torch.Tensor.numpy().

torch_tensor() Tensor

Returns the underlying PyTorch tensor if it has data. Otherwise, materializes the tensor and returns the materialized tensor. If the tensor was lazily created and randomly initialized, then successive calls to torch_tensor() may produce different results.

type(dtype: str | dtype | None = None, non_blocking: bool = False, **kwargs) str | SambaTensor

Returns the type if dtype is not provided, else casts this object to the specified type.

If self is already of the correct type, no copy is performed and the original object is returned.

  • dtype – The desired dtype.

  • non_blocking – If True, and the source is in pinned memory and destination is on the GPU or the source is on the GPU and the destination is in pinned memory, then the copy is performed asynchronously with respect to the host. Otherwise, the argument has no effect.

  • kwargs – For compatibility, may contain the key async in place of the non_blocking argument. The async arg is deprecated.


The PyTorch API optional keyword args

  • non_blocking (bool, optional)

  • **kwargs

are not supported on RDU and throw an exception.

See also

For details, see torch.Tensor.type().

type_as(tensor: SambaTensor) SambaTensor

Returns this tensor cast to the type of the given tensor.

New in version 1.18.

This is a no-op if the tensor is already of the correct type. This is equivalent to self.type(tensor.type())


tensor – the tensor which has the desired type

view_as(other: SambaTensor) SambaTensor

Returns a new SambaTensor with the same data as self but with other’s shape.


other – the SambaTensor whose shape is used for the new SambaTensor.

property T: SambaTensor

Alias for samba.t().

property data: Tensor

Handle to the data of the underlying PyTorch tensor.


Gets the data of the underlying PyTorch tensor.


Sets the data of the underlying PyTorch tensor.

property device: device

The torch.device where the host tensor is.

property dtype: dtype

Returns the type of the SambaTensor

property fast_access: bool

Fast access SambaTensors use the pinned_memory API. By default, SambaFlow automaticallys mark all input tensors as fast access after tracing. See samba.session.enable_pinned_memory for details.


Returns True if self is a fast access tensor, otherwise returns False.


Sets the fast_access property of self. Can set the fast_access property for an output gradient tensor even if it does not have a SambaTensor.

property grad: Tensor

Handle to the underlying PyTorch tensor’s gradient


Gets the underlying PyTorch tensor’s gradient


Sets the underlying PyTorch tensor’s gradient

property materializer: Callable[[SambaTensor, MultithreadedRNG], Tensor]

The SambaTensor’s materializer. The materializer is used to initialize a tensor with values when the tensor was lazily initialized.


Gets the SambaTensor’s materializer


Sets the SambaTensor’s materializer

property materializer_provided: bool

Returns True if a materializer is specified, otherwise returns False.

property ndim: Size | int

Alias for dim().

property requires_grad: bool

True if gradients need to be computed for this SambaTensor, False otherwise.


Returns True if gradients need to be computed for this SambaTensor, otherwise returns False.


Sets whether gradients need to be computed for this SambaTensor.

See also

See torch.Tensor.requires_grad().

property shape: Size | int

Alias for size().

property shape: Size | int

Alias for size().


Handle to the RDU device memory of a SambaTensor.


When accessed, returns a new torch.Tensor with a copy of its device memory.


When set, copies the data from the given tensor to its device memory.


Similar to sn_data, handle to the RDU device memory of its gradient tensor. self must have been compiled with requires_grad = True.


When accessed, copies self’s gradient from device memory to the host as a new torch.Tensor.


When set, copies the data from the given tensor to self’s gradient in device memory.

property sn_grad_name: str

Name of the SambaTensor’s gradient tensor.

property sn_name: str

Unique string identifier of each tensor that is initialized on the RDU device memory. If not initialized, it is the empty string ('').


Gets the SambaTensor’s sn_name.


Sets the SambaTensor’s sn_name.

property sn_region_name: str

Handle to self’s region name, used to denote a tensor’s location in memory. If the tensor was created without a region name, the sn_name is set as the region name. If two SambaTensors share the same sn_region_name, then they share the same location in device memory.


Gets the SambaTensor’s sn_region_name.


Sets the SambaTensor’s sn_region_name.


# if region_name is unspecified, sn_region_name will default to the sn_name, so sn_region_name will be "t0"
t0 = samba.SambaTensor(torch.Tensor([1, 2]), name="t0")

# sn_region_name will be "t1_other"
t1 = samba.SambaTensor(torch.Tensor([3, 4]), name="t1", region_name="t1_other")

# sn_region_name will be "t0", so SambaTensors t0 and t2 will share the same memory
t2 = samba.SambaTensor(torch.Tensor([1, 2]), name="t2", region_name="t0")

SambaTensor Utility Functions

from_torch_tensor(tensor: torch.Tensor, name: str | None = None, batch_dim: int | None = None, named_dims: Iterable[str | None] | None = None, region_name: str | None = None) SambaTensor

Converts a PyTorch tensor to a SambaTensor. If tensor is a SambaTensor, from_torch_tensor does nothing.

  • tensor – the torch.Tensor or SambaTensor to convert to a SambaTensor.

  • name – user-provided name of the source tensor.

  • batch_dim – Deprecated.

  • named_dims – Deprecated.

  • region_name – name for tensor’s location in memory.


SambaTensor or None if tensor is None

to_torch(obj: SambaTensor | Tensor | None) Tensor | None

Converts a SambaTensor to a PyTorch tensor. If obj is a PyTorch tensor, to_torch does nothing.


obj – The tensor to convert to a PyTorch tensor.


A PyTorch tensor if obj is a SambaTensor or a PyTorch tensor. If obj is None, returns None.