samba.sambatensor#
SambaTensor Class#
- class SambaTensor(torch_tensor: Tensor | None = None, shape: Iterable[int] | None = None, dtype: dtype | None = None, name: str | None = None, batch_dim: int | None = None, named_dims: Iterable[str | None] | None = None, sized_dims: Iterable[str | None] | None = None, materializer: Callable[[SambaTensor, MultithreadedRNG], Tensor] | None = None, is_complex: bool | None = None, region_name: str | None = None)#
The
SambaTensor
is the base tensor data structure for SambaFlow. It wrapstorch.Tensor
and adds custom data members and methods to support graph tracing and interfacing with the device. You must useSambaTensor
when you are running a model on the RDU (the device).Any application that runs on RDU must use SambaTensor, which supports:
Static tracing, which saves memory and compute resources
Getting and setting device memory
A
SambaTensor
can be constructed from atorch.Tensor
:using
samba.from_torch_tensor(torch_tensor)
(similar to constructing atorch.Tensor
from anumpy.ndarray
withtorch.from_numpy()
)directly with the
SambaTensor
constructor, that is,SambaTensor(torch_tensor)
.
With either construction method, the new SambaTensor and the original PyTorch tensor share the same memory, so any change to the original PyTorch tensor is reflected in the new SambaTensor. This is different from
torch.Tensor(np.ndarray)
which copies the data.A SambaTensor can be empty to use less memory. You can construct an empty SambaTensor with an empty PyTorch tensor using the methods listed above or with the
shape
anddtype
parameters. An empty SambaTensor is especially helpful for graph tracing where you only need tensorshapes
anddtypes
. An entire model can be instantiated with empty SambaTensors by using lazy parameters withsamba.lazy_param
. When an empty SambaTensor is used on the RDU, the SambaTensor’smaterializer
is used to initialize the data.Use
SambaTensor.torch()
to retrieve the original PyTorch tensor, similar totorch.Tensor.numpy()
which returns the originalnumpy.ndarray
.SambaTensors can be used on the host CPU just like PyTorch tensors, though the supported methods are limited to the functions in samba.functional.
Accessing device-side weight and gradient data
The
SambaTensor
provides APIs to directly access tensor data on RDU device memory. For example,# samba_tensor.sn_data and samba_tensor.sn_grad copy data # from the device to host memory PyTorch tensors print('data on device memory:', sambatensor.sn_data) print('gradient data on device memory if it exists:', sambatensor.sn_grad) # Data copy happens anytime sn_data and sn_grad are accessed sn_weight = weight.sn_data sn_grad = weight.sn_grad
Modifying device memory weight and grad data
The
SambaTensor
provides APIs to modify tensor data on RDU device memory.samba_tensor.sn_data = tensor # as long as isinstance(tensor, torch.Tensor), like torch.Tensor or SambaTensor
or
# transfer the host data to the device samba_tensor.rdu()
You can assign a PyTorch tensor or
SambaTensor
tosn_data
, which will copy the data to the tensor on the device. Similarly, you can assign a PyTorch tensor orSambaTensor
tosn_grad
, which will copy the data to the tensor’s gradient on the device.# Weight and its gradient are updated on host and then copied to the device weight.sn_data = sn_weight / torch.norm(sn_weight) # weight normalization on host weight.sn_grad = sn_grad / torch.norm(sn_grad) # grad normalization on host
Alternatively, we can use
rdu()
andcpu()
to synchronize the data between host memory (on CPU) and device memory (on RDU) of aSambaTensor
, e.g.:# Modify weights on the host only, weights on the device will # remain unchanged weight.data = weight / torch.norm(weight) weight.grad = weight / torch.norm(weight.grad) # Print device-side weights before synchronizing host-device memory print(weight.sn_data) # Copy host memory to device memory # Note: both data and grad will be synchronized weight.rdu() # Print device-side weights after device-to-host copy print(weight.sn_data) # Modify weight grad on device directly from the host weight.sn_grad = torch.zeros_like(sn_grad) # Print host-side weight gradients before synchronizing host-device memory print(weight.grad) # Copy device memory to host memory weight.cpu() # Print host-side weight gradients after device-to-host copy print(weight.grad)
The
sn_data
andsn_grad
members of theSambaTensor
class are Python data descriptors with custom setter and getter methods. When you accesssn_data
andsn_grad
from a SambaTensor, they return atorch.Tensor
to represent the data on device memory. Any modification to this returnedtorch.Tensor
is not reflected in RDU memory.Note
Tensor manipulation on the host is expensive because the computations are performed by the CPU and data synchronization between host and device is bandwidth-heavy. Do not use these four
SambaTensor
APIs (sn_data
,sn_grad
,rdu()
andcpu()
) unless necessary, e.g. when checkpointing models.SambaTensor
has similar methods and attributes astorch.Tensor
. In addition,SambaTensor
has methods and members that are specific to the RDU dataflow architecture.In instances where an operation involves input SambaTensors of different data types, SambaFlow will follow the dtype promotion rules that PyTorch uses to do the computation (see information on Promotion in https://pytorch.org/docs/stable/tensor_attributes.html#torch.dtype for details). For example, when calling
samba.add
with one input of dtypebfloat16
and the other input of dtypefloat32
, thebfloat16
SambaTensor will be promoted tofloat32
.- Parameters:
torch_tensor – A
torch.Tensor
object used to construct theSambaTensor
.shape – Shape of the tensor, used to implement tracing. Cannot be specified with
torch_tensor
.dtype – Data type of the tensor. Should be a
torch.dtype
object. Cannot be specified withtorch_tensor
.name – User-provided name for the SambaTensor, similar to
tf.Placeholder
.batch_dim – Deprecated.
named_dims – Deprecated.
sized_dims – Experimental. This argument is for a feature in development.
materializer – Function to initialize this tensor with values when transferring this tensor to the RDU. Only applicable if this tensor was lazily initialized (see
samba.session.enable_lazy_param
). The function should accept parametersshape
,dtype
, andrequires_grad
and return atorch.Tensor
. The materializer does not accept atorch.Tensor
.is_complex – Experimental. Whether this tensor represents a complex tensor or not.
region_name – Name for tensor’s location in memory. See
sn_region_name
.
Example
>>> import torch >>> import sambaflow.samba as samba
>>> # Initialize SambaTensor with constructor >>> torch_tensor = torch.Tensor([1, 2]) >>> samba_tensor0 = samba.SambaTensor(torch_tensor)
>>> # Initialize SambaTensor with samba.from_torch_tensor >>> samba_tensor1 = samba.from_torch_tensor(torch_tensor, name="samba_tensor1")
>>> # 3 ways to initialize empty SambaTensor with shape (2,3) >>> empty_samba_tensor0 = samba.SambaTensor(torch.empty(2, 3), name="empty_samba_tensor0") >>> empty_samba_tensor1 = samba.SambaTensor(shape=(2,3), dtype=torch.bfloat16, name="empty_samba_tensor1") >>> empty_samba_tensor2 = samba.from_torch_tensor(torch.empty(2, 3), name="empty_samba_tensor2")
- __getitem__(x: int | slice | None | SambaTensor | Tuple[int | slice | None | SambaTensor]) SambaTensor #
Indexes this SambaTensor. This function can be called with the
[]
operator.Currently supported index types are:
an integer, to retrieve a single element along that dimension.
a
slice
, to retrieve some subset of elements along that dimension.None
, to indicate that the tensor should be unsqueezed at that index.a
SambaTensor
, to gather indices indicated by the tensor. SambaTensor currently does not support indexing with multidimensional SambaTensors or multiple SambaTensors.a list, to retrieve some elements by indices along that dimension.
- Parameters:
x – the index object
Example:
>>> samba_tensor = samba.randn(2, 3) >>> samba_tensor.data tensor([[ 0.7102, -0.8594, -0.5047], [ 0.8140, -0.4194, 1.5488]]) >>> samba_tensor[:, 2].data tensor([-0.5047, 1.5488]) >>> index_tensor = samba.SambaTensor(torch.Tensor([0, 2])) >>> samba_tensor[None, :, index_tensor].data tensor([[[ 0.7102, -0.5047], [ 0.8140, 1.5488]]])
- __setitem__(x: int | slice | None | Tuple[int | slice | None], update: int | float | SambaTensor)#
Indexes this SambaTensor and sets the data. This function can be called with the
[]
operator.Currently supported index types are:
an integer, to retrieve a single element along that dimension.
a
slice
, to retrieve some subset of elements along that dimension.None
, to indicate that the tensor should be unsqueezed at that index.a
SambaTensor
, to gather indices indicated by the tensor. SambaTensor currently does not support indexing with multidimensional SambaTensors or multiple SambaTensors.
- Parameters:
x – the index object
Example:
>>> samba_tensor = samba.zeros(2,3) >>> samba_tensor.data tensor([[0., 0., 0.], [0., 0., 0.]]) >>> samba_tensor[:, 2] = samba.ones(2) >>> samba_tensor.data tensor([[0., 0., 1.], [0., 0., 1.]]) >>> index_tensor = samba.SambaTensor(torch.Tensor([0, 2])) >>> samba_tensor[None, :, index_tensor] = 2 * samba.ones(1, 2, 2) >>> samba_tensor.data tensor([[2., 0., 2.], [2., 0., 2.]])
- backward(gradient: SambaTensor | Tensor | None = None, retain_graph: bool | None = None) None #
Calls
torch.Tensor.backward()
on the underlying PyTorch tensor and computes the gradient of the PyTorch tensor with respect to the graph leaves.The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient,
backward()
also requires specifying the gradient.gradient
should be a tensor of the same type and location asself
that contains the gradient of the differentiated function w.r.t.self
.This function accumulates gradients in the leaves - you might need to zero
.grad
attributes or set them toNone
before calling it. See Default gradient layouts for details on the memory layout of accumulated gradients.- Parameters:
gradient – Gradient with respect to the tensor. If
gradient
is a tensor, it is automatically converted to a tensor that does not require a gradient.None
values can be specified ifself
is a scalar tensor or a tensor that doesn’t require a gradient. If aNone
value is acceptable, then this argument is optional. Defaults toNone
.retain_graph – If
False
, the graph used to compute the grads is freed. In nearly all cases setting this option toTrue
is not needed and often can be worked around in a much more efficient way. Defaults toNone
.
- bfloat16() SambaTensor #
self.bfloat16
is equivalent toself.type(torch.bfloat16)
. Seetype()
.
- bool() SambaTensor #
self.bool
is equivalent toself.type(torch.bool)
. Seetype()
.
- clear_data() None #
Clear the tensor data on the host.
- cpu(inplace: bool = False) None #
Copy the data from device memory to the host. Avoid using
cpu()
because this operation is bandwidth intensive.- Parameters:
inplace – whether to modify the underlying host memory in-place. Defaults to False.
- data_ptr() int #
Returns the address of the first element of the associated PyTorch tensor.
See also
- dim() Size | int #
Returns the number of dimensions of
self
tensor.
- element_size() int #
Returns the element size in bytes
- float() SambaTensor #
self.float
is equivalent toself.type(torch.float)
. Seetype()
.
- int() SambaTensor #
self.int
is equivalent toself.type(torch.int)
. Seetype()
.
- static is_fast_access(name: str) bool #
Returns
True
if the SambaTensor with sn_name name is a fast access tensor. ReturnsFalse
otherwise. Seefast_access
for details.
- is_floating_point() bool #
Returns
True
ifself
is a floating-point tensor, otherwise returnsFalse
.
- item() float | int #
Returns the value of this tensor as a standard Python number. This only works for tensors with one element. This operation is not differentiable.
Example
>>> x = samba.SambaTensor(torch.tensor([1.0])) >>> x.item() 1.0
- long() SambaTensor #
self.long
is equivalent toself.type(torch.long)
. Seetype()
.
- materialize_() None #
If the tensor does not have data, materializes the tensor. Otherwise, does nothing.
- new_empty(size: int | Iterable[int], dtype: dtype | None = None, requires_grad: bool = False) SambaTensor #
Returns a SambaTensor of size
size
filled with uninitialized data. By default, the returned SambaTensor has the sametorch.dtype
as this tensor.- Parameters:
size – a list, tuple, or
torch.Size
of integers defining the shape of the output tensor.dtype – the desired type of the returned tensor. If
None
, sametorch.dtype
as this SambaTensor.requires_grad – if autograd should record operations on the returned tensor. Defaults to False.
Example:
>>> sambatensor = samba.ones((), dtype=torch.float32) >>> sambatensor.new_empty((2, 3)).data tensor([[0.0000e+00, 1.4405e-41, 0.0000e+00], [0.0000e+00, 0.0000e+00, 0.0000e+00]])
See also
- new_full(size: Tuple[int], fill_value: int | float, dtype: dtype | None = None, device: device | None = None, requires_grad: bool | None = False) SambaTensor #
Returns a SambaTensor of size
size
filled withfill_value
. By default, the returned SambaTensor has the sametorch.dtype
as this tensor.- Parameters:
size – a list, tuple, or
torch.Size
of integers defining the shape of the output tensor.fill_value – the number to fill the output tensor with.
dtype – the desired type of the returned tensor. If
None
, sametorch.dtype
as this SambaTensor.device – the desired device of the returned tensor. If
None
, sametorch.device
as this tensor.requires_grad – if autograd should record operations on the returned tensor. Defaults to False.
Note
The PyTorch API optional keyword arg
device
is not supported on RDU and has no effect.
Example
>>> sambatensor = samba.ones((), dtype=torch.float32) >>> sambatensor.new_full((2, 3), 5.0).data tensor([[5., 5., 5.], [5., 5., 5.]])
>>> # new_full with explicit data type >>> sambatensor = samba.ones((), dtype=torch.float32) >>> sambatensor.new_full((2, 3), 5.0, dtype=torch.bfloat16).data tensor([[5., 5., 5.], [5., 5., 5.]], dtype=torch.bfloat16)
See also
- new_ones(size: int | Iterable[int], dtype: dtype | None = None, requires_grad: bool = False) SambaTensor #
Returns a SambaTensor of size
size
filled with1
. By default, the returned SambaTensor has the sametorch.dtype
as this tensor.- Parameters:
size – a list, tuple, or
torch.Size
of integers defining the shape of the output tensor.dtype – the desired type of the returned tensor. If
None
, sametorch.dtype
as this SambaTensor.requires_grad – if autograd should record operations on the returned tensor. Defaults to False.
Example
>>> sambatensor = samba.randn((), dtype=torch.bfloat16) >>> sambatensor.new_ones((2, 3)).data tensor([[1., 1., 1.], [1., 1., 1.]], dtype=torch.bfloat16)
See also
- new_zeros(size: int | Iterable[int], dtype: dtype | None = None, requires_grad: bool = False) SambaTensor #
Returns a SambaTensor of size
size
filled with0
. By default, the returned SambaTensor has the sametorch.dtype
as this tensor.- Parameters:
size – a list, tuple, or
torch.Size
of integers defining the shape of the output tensor.dtype – the desired type of the returned tensor. If
None
, sametorch.dtype
as this SambaTensor.requires_grad – if autograd should record operations on the returned tensor. Defaults to False.
Example
>>> sambatensor = samba.randn((), dtype=torch.bfloat16) >>> sambatensor.new_zeros((2, 3)).data tensor([[0., 0., 0.], [0., 0., 0.]], dtype=torch.bfloat16)
See also
- numel() int #
Returns the number of elements.
- rdu() None #
Synchronizes the host memory of the tensor (and its gradient if it exists) to its device memory. Similar to an in-place version of
torch.Tensor.cuda()
. Avoid usingrdu()
because this operation is bandwidth intensive.
- requires_grad_(requires_grad: bool = True) None #
Change if autograd should record operations on this tensor by setting this tensor’s
requires_grad
attribute in-place. Returns this tensor.requires_grad_()
’s main use case is to tell autograd to begin recording operations on a SambaTensor (tensor
). Iftensor
hasrequires_grad=False
(because it was obtained through a DataLoader, or required preprocessing or initialization),tensor.requires_grad_()
causes autograd to record operations ontensor
.- Parameters:
requires_grad – If autograd should record operations on this tensor. Default:
True
.
See also
- reusable() bool #
Returns
True
if the tensor memory can be reused for host-to-device data transfers.Note
reusable()
assumes that the host PyTorch tensor’s NumPy array is contiguous
- short() SambaTensor #
self.short
is equivalent toself.type(torch.short)
. Seetype()
.
- size(dim: int | None = None) Size | int #
Returns the size of the
self
tensor. Ifdim
is not specified, the returned value is atorch.Size
, a subclass oftuple
. Ifdim
is specified, returns an int holding the size of that dimension.- Parameters:
dim – the dimension for which to retrieve the size. Defaults to None.
Example
>>> t = samba.empty(3, 4, 5) >>> t.size() torch.Size([3, 4, 5]) >>> t.size(dim=1) 4
- stride(dim: int | None = None) int | Tuple[int] #
Returns the stride of tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension
dim
. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension dim.See also
See
torch.stride()
.
- to(*args, **kwargs) SambaTensor #
Performs Tensor dtype conversion. A
dtype
is inferred from the arguments ofself.to(*args, **kwargs)
.New in version 1.18.
Here are the ways to call
to
:- to(dtype) SambaTensor
Returns a SambaTensor with the specified
dtype
.
- to(other) SambaTensor
Returns a SambaTensor with the same
torch.dtype
as the SambaTensorother
.
Example
>>> samba.set_seed(1) >>> sambatensor = samba.randn(2, 2) # Initially dtype=float32 >>> sambatensor.to(torch.bfloat16).data tensor([[0.6602, 0.2676], [0.0618, 0.6211]], dtype=torch.bfloat16)
>>> other_torch = torch.randn((), dtype=torch.float64) >>> sambatensor.to(other_torch).data tensor([[0.6614, 0.2669], [0.0617, 0.6213]], dtype=torch.float64)
See also
- torch() Tensor #
Returns the SambaTensor’s underlying
torch.Tensor
. This method is the equivalent oftorch.Tensor.numpy()
.
- torch_tensor() Tensor #
Returns the underlying PyTorch tensor if it has data. Otherwise, materializes the tensor and returns the materialized tensor. If the tensor was lazily created and randomly initialized, then successive calls to
torch_tensor()
may produce different results.
- type(dtype: str | dtype | None = None, non_blocking: bool = False, **kwargs) str | SambaTensor #
Returns the type if dtype is not provided, else casts this object to the specified type.
If
self
is already of the correct type, no copy is performed and the original object is returned.- Parameters:
dtype – The desired dtype.
non_blocking – If
True
, and the source is in pinned memory and destination is on the GPU or the source is on the GPU and the destination is in pinned memory, then the copy is performed asynchronously with respect to the host. Otherwise, the argument has no effect.kwargs – For compatibility, may contain the key
async
in place of thenon_blocking
argument. Theasync
arg is deprecated.
Note
The PyTorch API optional keyword args
non_blocking
(bool, optional)**kwargs
are not supported on RDU and throw an exception.
See also
For details, see
torch.Tensor.type()
.
- type_as(tensor: SambaTensor) SambaTensor #
Returns this tensor cast to the type of the given tensor.
New in version 1.18.
This is a no-op if the tensor is already of the correct type. This is equivalent to
self.type(tensor.type())
- Parameters:
tensor – the tensor which has the desired type
- view_as(other: SambaTensor) SambaTensor #
Returns a new
SambaTensor
with the same data asself
but withother
’s shape.- Parameters:
other – the
SambaTensor
whose shape is used for the newSambaTensor
.
- property T: SambaTensor#
Alias for
samba.t()
.
- property data: Tensor#
Handle to the data of the underlying PyTorch tensor.
- Getter:
Gets the data of the underlying PyTorch tensor.
- Setter:
Sets the data of the underlying PyTorch tensor.
- property device: device#
The
torch.device
where the host tensor is.
- property dtype: dtype#
Returns the type of the SambaTensor
- property fast_access: bool#
Fast access
SambaTensors
use the pinned_memory API. By default, SambaFlow automaticallys mark all input tensors as fast access after tracing. Seesamba.session.enable_pinned_memory
for details.- Getter:
Returns
True
ifself
is a fast access tensor, otherwise returnsFalse
.- Setter:
Sets the
fast_access
property ofself
. Can set thefast_access
property for an output gradient tensor even if it does not have a SambaTensor.
- property grad: Tensor#
Handle to the underlying PyTorch tensor’s gradient
- Getter:
Gets the underlying PyTorch tensor’s gradient
- Setter:
Sets the underlying PyTorch tensor’s gradient
- property materializer: Callable[[SambaTensor, MultithreadedRNG], Tensor]#
The SambaTensor’s materializer. The materializer is used to initialize a tensor with values when the tensor was lazily initialized.
- Getter:
Gets the SambaTensor’s materializer
- Setter:
Sets the SambaTensor’s materializer
- property materializer_provided: bool#
Returns
True
if a materializer is specified, otherwise returnsFalse
.
- property requires_grad: bool#
True
if gradients need to be computed for this SambaTensor,False
otherwise.- Getter:
Returns
True
if gradients need to be computed for this SambaTensor, otherwise returnsFalse
.- Setter:
Sets whether gradients need to be computed for this SambaTensor.
See also
See
torch.Tensor.requires_grad()
.
- sn_data#
Handle to the RDU device memory of a SambaTensor.
- Getter:
When accessed, returns a new
torch.Tensor
with a copy of its device memory.- Setter:
When set, copies the data from the given tensor to its device memory.
- sn_grad#
Similar to
sn_data
, handle to the RDU device memory of its gradient tensor.self
must have been compiled withrequires_grad = True
.- Getter:
When accessed, copies
self
’s gradient from device memory to the host as a newtorch.Tensor
.- Setter:
When set, copies the data from the given tensor to
self
’s gradient in device memory.
- property sn_grad_name: str#
Name of the SambaTensor’s gradient tensor.
- property sn_name: str#
Unique string identifier of each tensor that is initialized on the RDU device memory. If not initialized, it is the empty string (
''
).- Getter:
Gets the SambaTensor’s sn_name.
- Setter:
Sets the SambaTensor’s sn_name.
- property sn_region_name: str#
Handle to
self
’s region name, used to denote a tensor’s location in memory. If the tensor was created without a region name, thesn_name
is set as the region name. If two SambaTensors share the samesn_region_name
, then they share the same location in device memory.- Getter:
Gets the SambaTensor’s sn_region_name.
- Setter:
Sets the SambaTensor’s sn_region_name.
Example:
# if region_name is unspecified, sn_region_name will default to the sn_name, so sn_region_name will be "t0" t0 = samba.SambaTensor(torch.Tensor([1, 2]), name="t0") # sn_region_name will be "t1_other" t1 = samba.SambaTensor(torch.Tensor([3, 4]), name="t1", region_name="t1_other") # sn_region_name will be "t0", so SambaTensors t0 and t2 will share the same memory t2 = samba.SambaTensor(torch.Tensor([1, 2]), name="t2", region_name="t0")
SambaTensor Utility Functions#
- from_torch_tensor(tensor: torch.Tensor, name: str | None = None, batch_dim: int | None = None, named_dims: Iterable[str | None] | None = None, region_name: str | None = None) SambaTensor #
Converts a PyTorch tensor to a
SambaTensor
. Iftensor
is aSambaTensor
,from_torch_tensor
does nothing.- Parameters:
tensor – the
torch.Tensor
orSambaTensor
to convert to aSambaTensor
.name – user-provided name of the source tensor.
batch_dim – Deprecated.
named_dims – Deprecated.
region_name – name for tensor’s location in memory.
- Returns:
SambaTensor
orNone
iftensor
isNone
- to_torch(obj: SambaTensor | Tensor | None) Tensor | None #
Converts a SambaTensor to a PyTorch tensor. If
obj
is a PyTorch tensor,to_torch
does nothing.- Parameters:
obj – The tensor to convert to a PyTorch tensor.
- Returns:
A PyTorch tensor if
obj
is a SambaTensor or a PyTorch tensor. Ifobj
isNone
, returnsNone
.