# samba.functional¶

## Arithmetic¶

abs(input: SambaTensor, *, out: SambaTensor | None = None)

New in version 1.18.

Computes the absolute value of each element in input.

$\text{out}_{i} = |\text{input}_{i}|$
Parameters:
• input – the input tensor.

• out – the output tensor. Defaults to None.

Note

The Pytorch keyword arg out is not supported on RDU and will throw an exception.

Example:

>>> x = samba.SambaTensor(torch.tensor([-1, -2, -3]))
>>> samba.abs(x).data
tensor([ 1,  2,  3])


For more details torch.abs()

add(input: SambaTensor, other: SambaTensor, alpha: int | float = 1, out: SambaTensor | None = None)

Computes the element-wise sum of the given input and other tensors

$\text{out}_i = \text{input}_i + \text{other}_i$
Parameters:
• input – the input tensor.

• other – the tensor or number to add to input.

• alpha – the multiplier for other. Defaults to 1.0.

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword args

• alpha (Number)

• out (Tensor, optional)

are supported only on CPU. They are not supported on RDU.

Example:

>>> a = samba.SambaTensor(torch.tensor([ 0.0202,  1.0985,  1.3506, -0.6056]))
tensor([20.0202, 21.0985, 21.3506, 19.3944])

>>> b = samba.SambaTensor(torch.tensor([-0.9732, -0.3497,  0.6245,  0.4022]))
>>> c = samba.SambaTensor(torch.tensor([[ 0.3743],
...                                     [-1.7724],
...                                     [-0.5811],
...                                     [-0.8017]]))
>>> (b + c).data
tensor([[-0.5989,  0.0246,  0.9988,  0.7765],
[-2.7456, -2.1221, -1.1479, -1.3702],
[-1.5543, -0.9308,  0.0434, -0.1789],
[-1.7749, -1.1514, -0.1772, -0.3995]])


For details see torch.add().

div(input: SambaTensor, other: SambaTensor | int | float, rounding_mode: str | None = None, out: SambaTensor | None = None)

Divides each element of the input by the corresponding element of other.

$\text{out}_i = \frac{\text{input}_i}{\text{other}_i}$
Parameters:
• input – the dividend.

• other – the divisor.

• rounding_mode – type of rounding mode applied to the result. Either None, "trunc", or "floor".

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword args

• rounding_mode (str, optional)

• out (Tensor, optional)

are not supported on RDU and will throw an exception.

Examples:

>>> x = samba.SambaTensor(torch.tensor([ 0.3810,  1.2774, -0.2972, -0.3719,  0.4637]))
>>> samba.div(x, 0.5).data
tensor([ 0.7620,  2.5548, -0.5944, -0.7438,  0.9274])

>>> a = samba.SambaTensor(torch.tensor([[-0.3711, -1.9353, -0.4605, -0.2917],
...                   [ 0.1815, -1.0111,  0.9805, -1.5923],
...                   [ 0.1062,  1.4581,  0.7759, -1.2344],
...                   [-0.1830, -0.0313,  1.1908, -1.4757]]))
>>> b = samba.SambaTensor(torch.tensor([ 0.8032,  0.2930, -0.8113, -0.2308]))
>>> samba.div(a, b).data
tensor([[-0.4620, -6.6051,  0.5676,  1.2639],
[ 0.2260, -3.4509, -1.2086,  6.8990],
[ 0.1322,  4.9764, -0.9564,  5.3484],
[-0.2278, -0.1068, -1.4678,  6.3938]])


For details, see torch.div().

fmod(input: SambaTensor, other: 'SambaTensor' | float | int, *, out: SambaTensor | None = None)

New in version 1.19.

Computes the element-wise modulus of the given input and other tensors.

The result has the same sign as the dividend input and its absolute value is less than that of other.

It’s equivalent to:

>>> input - input.div(other, rounding_mode="trunc") * other

Parameters:
• input – the dividend

• other – the divisor

• out – the output tensor. Defaults to None.

Supported data types:

• input: torch.bfloat16, torch.float32, torch.int16, torch.int32

• other: torch.bfloat16, torch.float32, torch.int16, torch.int32, int, float

Note

The PyTorch API optional keyword arg out is only supported on CPU, it is not supported on RDU, and will throw an error.

Example:

>>> a = samba.SambaTensor(torch.tensor([-3., -2, -1, 1, 2, 3]))
>>> samba.fmod(a, 2).data
tensor([-1., -0., -1.,  1.,  0.,  1.])


For more details see torch.fmod()

gelu(input: SambaTensor, approximate: str = 'none')

Applies the Gaussian Error Linear Units function:

$\text{GELU}(x) = x * \Phi(x)$

where $$\Phi(x)$$ is the Cumulative Distribution Function for Gaussian Distribution.

When the approximate argument is 'tanh', Gelu is estimated with:

$\text{GELU}(x) = 0.5 * x * (1 + \text{Tanh}(\sqrt{2 / \pi} * (x + 0.044715 * x^3)))$
Parameters:

approximate – the gelu approximation algorithm to use: 'none' | 'tanh'. Default: 'none'.

Supported data types:

• input: torch.bfloat16, torch.float32

For details see torch.nn.functional.gelu().

mul(input: SambaTensor, other: SambaTensor | int | float, *, out: SambaTensor | None = None)

New in version 1.18.

Computes the element-wise multiplication of the given input tensor with other. other can be a either scalar or a tensor.

$\text{out}_i = \text{input}_i \times \text{other}_i$
Parameters:
• input – the input tensor.

• other – the second input tensor or number.

• out – the output tensor. Defaults to None.

Note

The Pytorch keyword arg out is not supported on RDU and will throw an exception.

For more details see torch.mul().

neg(input: SambaTensor, *, out: SambaTensor | None = None)

Returns a new tensor with the negative of the elements of input tensor.

$\text{out} = -1 \times \text{input}$
Parameters:
• input – the input tensor.

• out – the output tensor. Defaults to None.

Supported data types:

• input: torch.bfloat16, torch.float32

Note

The Pytorch keyword arg out is not supported on RDU and will throw an exception.

For more details see torch.neg().

pow(input: SambaTensor, exponent: SambaTensor | int | float, *, out: SambaTensor | None = None)

Takes the power of each element in input with exponent and returns a tensor with the result.

$\text{out}_i = x_i ^ {\text{exponent}_i}$
Parameters:
• input – the input tensor.

• exponent – the exponent value.

• out – the output tensor. Defaults to None.

Note

The Pytorch keyword arg out is not supported on RDU and will throw an exception.

For details, see torch.pow().

relu(input: SambaTensor, inplace: bool = False)

New in version 1.18.

Applies the rectified linear unit function element-wise.

$\text{ReLU}(x) = (x)^+ = \max(0, x)$
Parameters:
• input – the input tensor.

• inplace – If set to True, will do this operation in-place.

Note

The PyTorch API optional keyword arg inplace is only supported on CPU, it is not supported on RDU, and will log a warning.

For details see torch.nn.functional.relu().

remainder(input: SambaTensor, other: 'SambaTensor' | float | int, *, out: SambaTensor | None = None)

New in version 1.19.

Computes the element-wise modulus of the given input and other tensors.

The result has the same sign as the divisor other and its absolute value is less than that of other.

It’s equivalent to:

>>> input - input.div(other, rounding_mode="floor") * other

Parameters:
• input – the dividend

• other – the divisor

• out – the output tensor. Defaults to None.

Supported data types:

• input: torch.int16, torch.int32

• other: torch.int16, torch.int32, int

For floating point modulus operation see samba.fmod()

Note

The PyTorch API optional keyword arg out is only supported on CPU, it is not supported on RDU, and will throw an error.

Example:

>>> a = samba.SambaTensor(torch.tensor([-3., -2, -1, 1, 2, 3]))
>>> samba.remainder(a, 2).data
tensor([ 1., -0.,  1.,  1.,  0.,  1.])


For more details see torch.remainder()

rsqrt(input: SambaTensor, *, out: SambaTensor | None = None)

New in version 1.18.

Returns a new tensor with the reciprocal of the square-root of each of the elements of input.

$\text{out}_{i} = \frac{1}{\sqrt{\text{input}_{i}}}$
Parameters:
• input – the input tensor.

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword arg out is only supported on CPU, it is not supported on RDU, and will throw an error.

Example:

>>> a = samba.randn(4)
>>> a.data
tensor([-0.0370,  0.2970,  1.5420, -0.9105])
>>> samba.rsqrt(a).data
tensor([nan, 1.8351, 0.8053, nan])


For details see torch.rsqrt().

rsub(input: SambaTensor, other: SambaTensor | int | float, alpha: int | float = 1)

Performs reverse subtraction, where the operands are swapped.

$\text{{out}}_i = \text{{other}}_i - \text{{alpha}} \times \text{{input}}_i$
Parameters:
• input – subtrahend tensor.

• other – minuend tensor.

• alpha – the multiplier for input.

Note

The PyTorch API optional keyword arg

• alpha (number)

is not supported on RDU and will throw an exception

scale(input: SambaTensor, value: float | SambaTensor)

New in version 1.18.

Multiplies each element of input by value.

$\text{out}_{i} = \text{value} * \text{input}_{i}$
Parameters:
• input – the input tensor.

• value – the value to multiply by.

Example

>>> samba.set_seed(1)
>>> x = samba.randn(3,4)
>>> x.data
tensor([[ 0.6614,  0.2669,  0.0617,  0.6213],
[-0.4519, -0.1661, -1.5228,  0.3817],
[-1.0276, -0.5631, -0.8923, -0.0583]])
>>> samba.scale(x, -1).data
tensor([[-0.6614, -0.2669, -0.0617, -0.6213],
[ 0.4519,  0.1661,  1.5228, -0.3817],
[ 1.0276,  0.5631,  0.8923,  0.0583]])

sigmoid(input: SambaTensor, *, out: SambaTensor | None = None)

New in version 1.18.

Computes the expit (also known as the logistic sigmoid function) of the elements of input.

$\text{out}_{i} = \frac{1}{1 + e^{-\text{input}_{i}}}$
Parameters:
• input – the input tensor.

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword arg out is not supported on RDU and will throw an error.

Example:

>>> a = samba.SambaTensor(torch.randn(4))
>>> a.data
tensor([ 0.9213,  1.0887, -0.8858, -1.7683])
>>> samba.sigmoid(a).data
tensor([ 0.7153,  0.7481,  0.2920,  0.1458])


For more details see torch.nn.functional.sigmoid().

silu(input: SambaTensor, inplace: bool = False)

New in version 1.18.

Applies the Sigmoid Linear Unit (SiLU) function, element-wise. The SiLU function is also known as the swish function.

$\text{silu}(x) = x * \sigma(x), \text{where } \sigma(x) \text{ is the logistic sigmoid.}$
Parameters:
• input – tensor to perform the operation.

• inplace – If set to True, will do this operation in-place.

Note

The PyTorch API optional keyword arg inplace is not supported on RDU and will throw an error.

For more details see torch.nn.functional.silu()

softmax(input: SambaTensor, dim: int = None, _stacklevel: int = 3, dtype: torch.dtype | None = None)

Applies a softmax function.

Softmax is defined as:

$\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$

It is applied to all slices along dim, and will re-scale them so that the elements lie in the range [0, 1] and sum to 1.

$\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$
Parameters:
• input – the input tensor.

• dim – a dimension along which softmax will be computed.

• dtype – the desired data type of returned tensor. If specified, the input tensor is cast to dtype before the operation is performed. This is useful for preventing data type overflows. Default: None.

For more details torch.nn.functional.softmax().

sqrt(input: SambaTensor, out: SambaTensor | None = None)

Returns a new tensor with the square-root of the elements of input.

$\text{out}_{i} = \sqrt{\text{input}_{i}}$
Parameters:
• input – the input tensor.

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword arg out is not supported on RDU and will throw an error.

Example:

>>> a = samba.SambaTensor(torch.tensor([-2.0755,  1.0226,  0.0831,  0.4806]))
>>> samba.sqrt(a).data
tensor([nan,  1.0112,  0.2883,  0.6933])


For more details torch.sqrt().

sub(input: SambaTensor, other: SambaTensor | float | int, *, alpha: float | int = 1, out: SambaTensor | None = None)

Subtracts other, scaled by alpha, from input.

$\text{{out}}_i = \text{{input}}_i - \text{{alpha}} \times \text{{other}}_i$
Parameters:
• input – the input tensor.

• other – the tensor or scalar to subtract from input

• alpha – the scalar multiplier for other.

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword args

• out (Tensor, optional)

• alpha (Scalar)

are only supported on CPU, they are not supported on RDU.

Example:

>>> a = samba.SambaTensor(torch.tensor((1, 2)))
>>> b = samba.SambaTensor(torch.tensor((0, 1)))
>>> samba.sub(a, b).data
tensor([1, 1])


For more details torch.sub().

tanh(input: SambaTensor, *, out: SambaTensor | None = None)

Returns a new tensor with the hyperbolic tangent of the elements of input.

$\text{out}_{i} = \tanh(\text{input}_{i})$
Parameters:
• input – the input tensor.

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword args

• out (Tensor, optional)

are only supported on CPU, they are not supported on RDU and will throw an exception.

Example:

>>> a = samba.SambaTensor(torch.tensor([0.8986, -0.7279,  1.1745,  0.2611], dtype=torch.bfloat16))
>>> samba.tanh(a).data
tensor([ 0.7148, -0.6211,  0.8242,  0.2559], dtype=torch.bfloat16)


For more details torch.tanh().

## Generator¶

Fills elements of input tensor with value where mask is True. The shape of mask must be broadcastable with the shape of input.

Parameters:
• input – the input tensor

• value – the value to fill in with

Note

Only 2 dimensional inputs/masks are supported right now

For more details torch.Tensor.masked_fill()

Fills elements of self tensor with value where mask is True. The shape of mask must be broadcastable with the shape of the underlying tensor. The operation is done inplace.

Parameters:
• input – the input SambaTensor

• value – the value to fill in with

For more details torch.Tensor.masked_fill_()

triu_fill(input: , value: int | float)

Out-of-place version of triu_fill_().

triu_fill_(input: SambaTensor, value: int | float)

Fills the upper triangle of the last 2D-dimension of input with value in-place. Does not fill the diagonal itself. input’s two inner-most dimensions must be a square matrix.

Parameters:
• input – the SambaTensor to fill

• value – the value to fill with

Example:

>>> x = samba.randn(3,3)
>>> x.data
tensor([[ 1.3290, -0.9150, -0.1482],
[ 0.4660, -0.9847, -0.7689],
[-1.1259, -0.9790, -0.3892]])
>>> samba.triu_fill_(x, 14)
>>> x.data
tensor([[ 1.3290, 14.0000, 14.0000],
[ 0.4660, -0.9847, 14.0000],
[-1.1259, -0.9790, -0.3892]])


## Logical¶

bitwise_not(input: SambaTensor, *, out: SambaTensor | None = None)

Computes the bitwise NOT of the given input tensor. The input tensor must be an int or bool type. For bool tensors, it computes the logical NOT.

Parameters:
• input – the input tensor.

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword arg

• out (Tensor, optional)

is not supported on RDU and will throw an exception

Note

bitwise_not on RDU only supports bool dtype for input.

Example

>>> samba.bitwise_not(samba.SambaTensor(torch.tensor([True, True, False], dtype=torch.bool))).data
tensor([False, False,  True])


For more details torch.bitwise_not().

logical_or(input: SambaTensor, other: SambaTensor, *, out: SambaTensor | None = None) Tensor

Computes the element-wise logical OR of the given input tensors. Zeros are treated as False and nonzeros are treated as True.

Parameters:
• input – the input tensor.

• other – the tensor to compute OR with

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword args

• out (Tensor, optional)

are not supported on RDU and will throw an exception

Example:

>>> samba.logical_or(samba.SambaTensor(torch.tensor([True, False, True])),
samba.SambaTensor(torch.tensor([True, False, False]))).data
tensor([ True, False,  True])
>>> a = samba.SambaTensor(torch.tensor([0, 1, 10, 0], dtype=torch.int8))
>>> b = samba.SambaTensor(torch.tensor([4, 0, 1, 0], dtype=torch.int8))
>>> samba.logical_or(a, b).data
tensor([ True,  True,  True, False])
>>> samba.logical_or(a.double(), b.double()).data
tensor([ True,  True,  True, False])
>>> samba.logical_or(a.double(), b).data
tensor([ True,  True,  True, False])


For more details torch.logical_or()

## Loss¶

cross_entropy(input: SambaTensor, target: SambaTensor, weight: SambaTensor | None = None, size_average: bool | None = None, ignore_index: int = -100, reduce: bool | None = None, reduction: str = 'mean', label_smoothing: float = 0.0)

This criterion computes the cross entropy loss between input and target.

See CrossEntropyLoss for details.

Parameters:
• input$$(N, C)$$ where C = number of classes or $$(N, C, H, W)$$ in case of 2D Loss, or $$(N, C, d_1, d_2, ..., d_K)$$ where $$K \geq 1$$ in the case of K-dimensional loss. input is expected to contain unnormalized scores (often referred to as logits).

• target – If containing class indices, shape $$(N)$$ where each value is $$0 \leq \text{targets}[i] \leq C-1$$, or $$(N, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss. If containing class probabilities, same shape as the input.

• weight – a manual rescaling weight given to each class. If given, has to be a tensor of size C.

• size_average – Deprecated (see reduction).

• ignore_index – Specifies a target value that is ignored and does not contribute to the input gradient. When size_average is True, the loss is averaged over non-ignored targets. Note that ignore_index is only applicable when the target contains class indices. Default: -100

• reduce – Deprecated (see reduction).

• reduction – Specifies the reduction to apply to the output: none | mean | sum. none: no reduction will be applied, mean: the sum of the output will be divided by the number of elements in the output, sum: the output will be summed. Note: size_average and reduce are being deprecated. Currently, specifying either of those args overrides reduction. Default: mean

• label_smoothing – A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss, where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in the external article Rethinking the Inception Architecture for Computer Vision. Default: $$0.0$$.

Supported data types:

• input: torch.bfloat16, torch.float32

• target: torch.int32, torch.int64

Note

The PyTorch API optional keyword args

• reduce (bool, optional)

• size_average (bool, optional)

are not supported on RDU and will throw an exception

Examples:

>>> # Example of target with class indices
>>> input = samba.randn(3, 5, requires_grad=True)
>>> target = samba.randint(5, (3,), dtype=torch.int64)
>>> loss = samba.cross_entropy(input, target)


## Modules¶

multi_head_attention(query: SambaTensor, key: SambaTensor, value: SambaTensor, embed_dim_to_check: int, num_heads: int, in_proj_weight: SambaTensor | None, in_proj_bias: SambaTensor | None, bias_k: SambaTensor | None, bias_v: SambaTensor | None, add_zero_attn: bool, dropout_p: float, out_proj_weight: SambaTensor, out_proj_bias: SambaTensor, training: bool = True, key_padding_mask: SambaTensor | None = None, need_weights: bool = True, attn_mask: SambaTensor | None = None, use_separate_proj_weight: bool = False, q_proj_weight: SambaTensor | None = None, k_proj_weight: SambaTensor | None = None, v_proj_weight: SambaTensor | None = None, static_k: SambaTensor | None = None, static_v: SambaTensor | None = None)

New in version 1.18.

Allows the model to jointly attend to information from different representation subspaces. See reference: Attention Is All You Need.

For details see torch.nn.MultiheadAttention or torch.nn.functional.multi_head_attention_forward().

Parameters:
• query – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.

• key – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.

• value – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.

• embed_dim_to_check – total dimension of the model.

• in_proj_weight – input projection weight and bias. Required if use_separate_proj_weight is False.

• in_proj_bias – input projection weight and bias. Required if use_separate_proj_weight is False.

• bias_k – bias of the key and value sequences to be added at dim=0.

• bias_v – bias of the key and value sequences to be added at dim=0.

• add_zero_attn – add a new batch of zeros to the key and value sequences at dim=1.

• dropout_p – probability of an element to be zeroed.

• out_proj_weight – the output projection weight and bias.

• out_proj_bias – the output projection weight and bias.

• training – apply dropout if is True.

• key_padding_mask – if provided, specified padding elements in the key will be ignored by the attention. This is an binary mask. When the value is True, the corresponding value on the attention layer will be filled with -inf.

• need_weights – output attn_output_weights. Default: True

• attn_mask – 2D or 3D mask that prevents attention to certain positions. A 2D mask will be broadcasted for all the batches while a 3D mask allows to specify a different mask for the entries of each batch.

• is_causal – If specified, applies a causal mask as attention mask, and ignores attn_mask for computing scaled dot product attention. Default: False.

• use_separate_proj_weight – the function accept the proj. weights for query, key, and value in different forms. If false, in_proj_weight will be used, which is a combination of q_proj_weight, k_proj_weight, v_proj_weight.

• q_proj_weight – input projection weight and bias.

• k_proj_weight – input projection weight and bias.

• v_proj_weight – input projection weight and bias.

• in_proj_bias – input projection weight and bias.

• static_k – static key and value used for attention operators.

• static_v – static key and value used for attention operators.

• average_attn_weights – If True, indicates that the returned attn_weights should be averaged across heads. Otherwise, attn_weights are provided separately per head. Note that this flag only has an effect when need_weights=True.. Default: True

Note

The PyTorch API keyword args

• value (SambaTensor)

• embed_dim_to_check (int)

• bias_k (SambaTensor, optional)

• bias_v (SambaTensor, optional)

• training (bool)

• is_causal (bool)

• static_k (SambaTensor, optional)

• static_v (SambaTensor, optional)

• average_attn_weights (bool)

are only supported on CPU, they are not supported on RDU.

class FlashFFTConv(Nx: List[int], dtype: dtype, prefix: str = '')

New in version 1.19.

FlashFFTConv module. Effectively computes a zero-padded convolution of the input tensor and the kernel tensor. See the forward function for more info on the input tensors. See FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores for more info on the logic of the module.

Parameters:
• Nx – list of input dimensions. The length of Nx is the p parameter which governs how many pieces

• into (we split the sequence length) –

• DFT. (Nx will be the dimension of the) –

• dtype – either torch.bfloat16 or torch.float32.

• prefix – an optional string prefix to prepend to the operator names.

Note

There are some constraints on Nx:

• We support p = 2, 3, 4.

• The last element of Nx must be even. This is because we need to pad the raw input by 2x, so the latter half of the last dim of Nx will be all zeros.

• Let N be the product of elements in Nx. Let ipt be the input tensor. Then N must equal ipt.shape[-1] * 2. The factor of 2 is due to the padding; we pad the raw input before applying DFTs to it.

Example

>>> import torch
>>> import sambaflow.samba as samba
>>> from sambaflow.samba.nn.flash_fft_conv import FlashFFTConv
>>> batch_size = 2
>>> hidden_dim = 1
>>> sequence_length = 8
>>> ipt = torch.randn(batch_size, hidden_dim, sequence_length)
>>> kernel = torch.ones(hidden_dim, sequence_length)
>>> flash_fft_conv = FlashFFTConv([4, 2, 2], torch.float32) # 4 * 2 * 2 == sequence_length * 2
>>> samba.from_torch_model_(flash_fft_conv)
>>> result = flash_fft_conv(ipt, kernel) # observe the all-ones filter convolution effect on ipt

forward(input: Tensor, kernel: Tensor) Tuple[Tensor]
Parameters:
• input – a tensor of shape (batch_size, hidden_dim, sequence_length).

• kernel – a tensor of shape (hidden_dim, sequence_length). Must match the last two dims of the input tensor.

## Normalization¶

layer_norm(input: SambaTensor, normalized_shape: List[int] | Tuple[int], weight: SambaTensor | None = None, bias: SambaTensor | None = None, eps: float = 1e-05)

New in version 1.18.

Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization

$y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta$

The mean and standard-deviation are calculated over the last D dimensions, where D is the dimension of normalized_shape. For example, if normalized_shape is (3, 5) (a 2-dimensional shape), the mean and standard-deviation are computed over the last 2 dimensions of the input (i.e. input.mean((-2, -1))). $$\gamma$$ and $$\beta$$ are learnable affine transform parameters of normalized_shape if elementwise_affine is True. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).

Note

Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the affine option, Layer Normalization applies per-element scale and bias with elementwise_affine.

Note

Layer_norm on RDU only supports normalized_shape of length 1 and whose element is the same as the input’s last dimension size.

This layer uses statistics computed from input data in both training and evaluation modes.

Parameters:
• input – the input tensor.

• normalized_shape

input shape from an expected input of size

$\begin{split}[* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \\ \text{normalized_shape}[-1]]\end{split}$

If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.

• eps – a value added to the denominator for numerical stability. Default: 1e-5.

• elementwise_affine – a boolean value that when set to True, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default: True.

Example

>>> samba.set_seed(1)
>>> ipt = samba.SambaTensor(torch.randn((4,5), dtype=torch.bfloat16))
>>> samba.layer_norm(ipt, [5]).data
tensor([[ 0.8398,  1.0938, -0.3145,  0.1064, -1.7266],
[-1.4922,  0.1562, -0.5938,  0.4355,  1.4922],
[ 1.0391,  0.6211, -1.7188, -0.5156,  0.5742],
[ 1.7891, -0.2773, -1.2578, -0.3711,  0.1128]], dtype=torch.bfloat16)


For more details see torch.nn.functional.layer_norm().

## Reduce¶

argmax(input: SambaTensor, dim: int, keepdim: bool = False)

New in version 1.19.

Returns the index of maximum values of a tensor across a dimension.

For more details see |torch argmax|_

max(input: SambaTensor, dim: int, keepdim: bool = False, * out: Optional[Tuple[SambaTensor, SambaTensor]] = None) SambaTensor | Tuple[SambaTensor, SambaTensor]

Returns a tuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim and indices is the index location of each maximum value found (argmax).

If keepdim is True, the output tensors are of the same size as input except in the dimension dim where they are of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensors having 1 fewer dimension than input.

Parameters:
• input – the input tensor.

• dim – the dimension to reduce.

• keepdim – specifies whether to retain dim in the output tensor. Default: False.

• out – the output tensor. Defaults to None.

Supported data types:

• input: torch.float32

Note

The Pytorch keyword arg out is not supported on RDU and will throw an exception.

Example:

>>> a = samba.randn(4, 4)
>>> a.data
tensor([[-1.2360, -0.2942, -0.1222,  0.8475],
[ 1.1949, -1.1127, -2.2379, -0.6702],
[ 1.5717, -0.9207,  0.1297, -1.8768],
[-0.6172,  1.0036, -0.6060, -0.2432]])
>>> samba.max(a, 1)[0].data
tensor([0.8475, 1.1949, 1.5717, 1.0036])
>>> samba.max(a, 1)[1].data
tensor([3, 0, 0, 1])


For details see torch.max()

mean(input: SambaTensor, dim: List[int] | int = None, keepdim: bool = False, *, dtype: torch.dtype | None = None, out: SambaTensor | None = None)

Returns the mean value of each row of the input tensor in the given dimension dim. If dim is a list of dimensions, reduces over all of them.

If keepdim is True, the output tensor is of the same size as input except in the dimension(s) dim where it is of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensor having 1 (or len(dim)) fewer dimension(s).

Parameters:
• input – the input tensor.

• dim – the dimension or dimensions to reduce.

• keepdim – specifies whether to retain dim in the output tensor.

• dtype – the desired data type of returned tensor. If specified, the input tensor is cast to dtype before the operation is performed. Specifying dtype is useful for preventing data type overflows. Defaults to None.

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword args

• dtype (torch.dtype, optional)

• out (Tensor, optional)

are not supported on RDU and will throw an exception.

Example:

>>> a = samba.randn(4, 4)
>>> a.data
tensor([[-0.3841,  0.6320,  0.4254, -0.7384],
[-0.9644,  1.0131, -0.6549, -1.4279],
[-0.2951, -1.3350, -0.7694,  0.5600],
[ 1.0842, -0.9580,  0.3623,  0.2343]])
>>> samba.mean(a, 1).data
tensor([-0.0163, -0.5085, -0.4599,  0.1807])
>>> samba.mean(a, 1, True).data
tensor([[-0.0163],
[-0.5085],
[-0.4599],
[ 0.1807]])


For details, see torch.mean()

## Regularization¶

dropout(input: SambaTensor, p: float = 0.5, training: bool = True, inplace: bool = False)

During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.

Parameters:
• input – the input tensor.

• p – probability of an element to be zeroed. Default: 0.5.

• training – If set to True, applies dropout. Default: True

• inplace – If set to True, performs this operation in-place. Default: False

Supported data types:

• input: torch.bfloat16, torch.float32

Note

The Pytorch keyword argument inplace is supported on CPU but will have no effect on RDU

dropout2d(input: SambaTensor, p: float = 0.5, training: bool = True, inplace: bool = False)

Randomly zeroes out entire channels. A channel is a 2D feature map, e.g., the $$j$$-th channel of the $$i$$-th sample in the batched input is a 2D tensor $$\text{input}[i, j]$$) of the input tensor. Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.

See Dropout2d for details.

Parameters:
• input – the input tensor

• p – probability of a channel to be zeroed. Default: 0.5

• training – If set to True, applies dropout. Default: True

• inplace – If set to True, performs this operation in-place. Default: False

Note

The Pytorch keyword argument inplace is supported on CPU but will have no effect on RDU

dropout3d(input: SambaTensor, p: float = 0.5, training: bool = True, inplace: bool = False)

Randomly zeroes out entire channels. A channel is a 3D feature map, e.g., the $$j$$-th channel of the $$i$$-th sample in the batched input is a 3D tensor $$\text{input}[i, j]$$) of the input tensor. Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.

See Dropout3d for details.

Parameters:
• input – the input tensor.

• p – probability of a channel to be zeroed. Default: 0.5.

• training – If set to True, applies dropout. Default: True.

• inplace – If set to True, performs this operation in-place. Default: False.

Note

The Pytorch keyword argument inplace is supported on CPU but will have no effect on RDU

## Tensor Arithmetic¶

addmm(input: Tensor, mat1: Tensor, mat2: Tensor, *, beta: Number | None = 1, alpha: Number | None = 1, out: Tensor | None = None) Tensor

Performs a matrix multiplication of the matrices mat1 and mat2. The matrix input is added to the final result.

If mat1 is a $$(n \times m)$$ tensor and mat2 is a $$(m \times p)$$ tensor, then input must be broadcastable with a $$(n \times p)$$ tensor and out will be a $$(n \times p)$$ tensor.

alpha and beta are scaling factors on matrix-vector product between mat1 and mat2 and the added matrix input respectively.

$\text{out} = \beta\ \text{input} + \alpha\ (\text{mat1}_i \mathbin{@} \text{mat2}_i)$

If beta is 0, then input will be ignored, and nan and inf in it will not be propagated.

For inputs of type FloatTensor or DoubleTensor, arguments beta and alpha must be real numbers, otherwise they should be integers.

This operator supports TensorFloat32.

Parameters:
• input – matrix to be added.

• mat1 – the first matrix to be matrix multiplied.

• mat2 – the second matrix to be matrix multiplied.

• beta – multiplier for input ($$\beta$$).

• alpha – multiplier for $$mat1 @ mat2$$ ($$\alpha$$).

• out – the output tensor.

Example:

>>> M = samba.randn(2, 3)
>>> mat1 = samba.randn(2, 3)
>>> mat2 = samba.randn(3, 3)
tensor([[-4.8716,  1.4671, -1.3746],
[ 0.7573, -3.9555, -2.8681]])


Supported data types:

• input: torch.bfloat16, torch.float32

• mat1: torch.bfloat16, torch.float32

• mat2: torch.bfloat16, torch.float32

Note

The Pytorch keyword arg out is not supported on RDU and will throw an exception.

Note

This operator works on RDU only if the inputs meet these limitations:

1. input needs to be (p,).

2. alpha needs to be 1.

3. beta needs to be 1.

4. mat2 is a (p x m) matrix if is_transposed == True otherwise is a (m x p) matrix.

5. mat1 can be a 3D tensor if one of the dimension is batch_dim.

6. mat2 cannot have a batch_dim.

Additionally, if is_transposed == False, then mat1 is an (n x m) tensor and mat2 is a (m x p) tensor.

For details see torch.addmm().

bmm(input: SambaTensor, mat2: SambaTensor, *, out: SambaTensor | None = None)

New in version 1.18.

Performs a batch matrix-matrix multiplication of matrices stored in input and mat2.

input and mat2 must be 3-D tensors each containing the same number of matrices.

If input is a $$(b \times n \times m)$$ tensor, mat2 is a $$(b \times m \times p)$$ tensor, out will be a $$(b \times n \times p)$$ tensor.

$\text{out}_i = \text{input}_i \mathbin{@} \text{mat2}_i$
Parameters:
• input – the first batch of matrices to be multiplied

• mat2 – the second batch of matrices to be multiplied

Note

The PyTorch API optional keyword args

• out (Tensor, optional)

are not supported on RDU and will throw an exception

Example:

>>> input = samba.SambaTensor(torch.randn(10, 3, 4))
>>> mat2 = samba.SambaTensor(torch.randn(10, 4, 5))
>>> res = samba.bmm(input, mat2)
>>> res.size()
torch.Size([10, 3, 5])


For more details torch.bmm()

cumsum(input: SambaTensor, dim: int, *, dtype: torch.dtype | None = None, out: SambaTensor | None = None)

Returns the cumulative sum of elements of input in the dimension dim.

For example, if input is a vector of size N, the result will also be a vector of size N, with elements.

$y_i = x_1 + x_2 + x_3 + \dots + x_i$
Parameters:
• input – the input tensor.

• dim – the dimension to do the operation over

• dtype – the desired data type of returned tensor.

• out – the output tensor. Defaults to None.

Supported data types:

• input: torch.bfloat16, torch.float32, torch.int64

Note

The PyTorch API optional keyword args

• dtype (torch.dtype, optional)

• out (SambaTensor, optional)

are not supported on RDU and will throw an exception

Example:

>>> a = samba.randn(10)
>>> a.data
tensor([-0.8286, -0.4890,  0.5155,  0.8443,  0.1865, -0.1752, -2.0595,
0.1850, -1.1571, -0.4243])
>>> samba.cumsum(a, dim=0).data
tensor([-0.8286, -1.3175, -0.8020,  0.0423,  0.2289,  0.0537, -2.0058,
-1.8209, -2.9780, -3.4022])


For more details torch.cumsum().

linear(input: SambaTensor, weight: SambaTensor, bias: SambaTensor | None = None)

Applies a linear transformation to the incoming data: $$y = xA^T + b$$.

Parameters:
• input – the input tensor x.

• weight – the weight tensor A.

• bias – the bias tensor b.

Shape:

• Input: $$(N, *, in\_features)$$ $$N$$ is the batch size, * means any number of additional dimensions.

• Weight: $$(out\_features, in\_features)$$.

• Bias: $$(out\_features)$$.

• Output: $$(N, *, out\_features)$$.

For more details see torch.nn.functional.linear().

matmul(input: SambaTensor, other: SambaTensor, *, out: SambaTensor | None = None)

Matrix product of two tensors.

The behavior depends on the dimensionality of the tensors as follows:

• If both tensors are 1-dimensional, the dot product (scalar) is returned.

• If both tensors are 2-dimensional, the matrix-matrix product is returned.

• If the first tensor is 1-dimensional and the second tensor is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.

• If the first tensor is 2-dimensional and the second tensor is 1-dimensional, the matrix-vector product is returned.

• If both tensors are at least 1-dimensional and at least one tensor is N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first tensor is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second tensor is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are broadcasted (and thus must be broadcastable). For example, if input is a $$(j \times 1 \times n \times n)$$ tensor and other is a $$(k \times n \times n)$$ tensor, out will be a $$(j \times k \times n \times n)$$ tensor.

Note that the broadcasting logic only looks at the batch dimensions when determining if the inputs are broadcastable, and not at the matrix dimensions. For example, if input is a $$(j \times 1 \times n \times m)$$ tensor and other is a $$(k \times m \times p)$$ tensor, these inputs are valid for broadcasting even though the final two dimensions (i.e. the matrix dimensions) are different. out will be a $$(j \times k \times n \times p)$$ tensor.

Parameters:
• input – the first tensor to be multiplied.

• other – the second tensor to be multiplied.

• out – the output tensor. Defaults to None.

Supported data types:

• input: torch.bfloat16, torch.float32

• other: torch.bfloat16, torch.float32

Note

The Pytorch keyword arg out is not supported on RDU and will throw an exception.

For more details see torch.matmul().

## Tensor Ops¶

cat(tensors: List[SambaTensor] | Tuple[SambaTensor], dim: int | None = 0, axis: int | None = None, *, out: SambaTensor | None = None)

Concatenates the given sequence of seq tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty.

torch.cat() can be seen as an inverse operation for torch.split() and torch.chunk().

Parameters:
• tensors – any Python sequence of tensors of the same type. Non-empty tensors must have the same shape, except in the cat dimension.

• dim – the dimension over which the tensors are concatenated

• axis – alias for dim, cannot be specified with dim.

• out – the output tensor. Defaults to None.

Supported data types:

• tensors: torch.bfloat16, torch.float32, torch.int16, torch.int32, torch.int64

Note

The Pytorch keyword arg out is not supported on RDU and will throw an exception.

Example:

>>> x = samba.randn(2, 3)
>>> x.data
tensor([[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497]])
>>> samba.cat((x, x, x), 0).data
tensor([[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497],
[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497],
[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497]])
>>> samba.cat((x, x, x), 1).data
tensor([[ 0.6580, -1.0969, -0.4614,  0.6580, -1.0969, -0.4614,  0.6580,
-1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497, -0.1034, -0.5790,  0.1497, -0.1034,
-0.5790,  0.1497]])


For details, see torch.cat()

embedding(weight: SambaTensor, input: SambaTensor, padding_idx: int, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool, sparse: bool)

A simple lookup table that looks up embeddings in a fixed dictionary and size.

This function is often used to retrieve word embeddings using indices. The input is a list of indices and the embedding matrix. The output is the corresponding word embeddings.

See torch.nn.Embedding for details.

Parameters:
• input – tensor containing indices into the embedding matrix.

• weight – the embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size.

• padding_idx – if specified, the entries at padding_idx do not contribute to the gradient; therefore, the embedding vector at padding_idx is not updated during training, i.e. it remains as a fixed “pad”.

• max_norm – if given, each embedding vector with norm larger than max_norm is renormalized to have norm max_norm. Note: this will modify weight in place.

• norm_type – The p of the p-norm to compute for the max_norm option. Default: 2.

• scale_grad_by_freq – If given, scales gradients by the inverse of frequency of the words in the mini-batch. Default: False.

• sparse – If True, gradient w.r.t. weight will be a sparse tensor. See Notes under torch.nn.Embedding for details.

Shape:

• Input: Integer tensor of arbitrary shape containing the indices to extract.

• Weight: Embedding matrix of floating point type with shape $$(V, embedding\_dim)$$, where $$V = maximum\ index + 1$$ and embedding_dim = the embedding size.

• Output: $$(*, embedding\_dim)$$, where $$*$$ is the input shape.

Supported data types:

• input: torch.int32, torch.int64

• weight: torch.bfloat16, torch.float32

Note

The PyTorch keyword arguments max_norm, norm_type, scale_grad_by_freq, and sparse are supported on CPU but will have no effect on RDU.

Examples:

>>> # a batch of 2 samples of 4 indices each
>>> input = samba.SambaTensor(torch.tensor([[1,2,4,5],[4,3,2,9]]))
>>> # an embedding matrix containing 10 tensors of size 3
>>> embedding_matrix = samba.rand(10, 3)
>>> samba.embedding(input, embedding_matrix).data
tensor([[[ 0.8490,  0.9625,  0.6753],
[ 0.9666,  0.7761,  0.6108],
[ 0.6246,  0.9751,  0.3618],
[ 0.4161,  0.2419,  0.7383]],

[[ 0.6246,  0.9751,  0.3618],
[ 0.0237,  0.7794,  0.0528],
[ 0.9666,  0.7761,  0.6108],
[ 0.3385,  0.8612,  0.1867]]])

expand(input: SambaTensor, *sizes: Tuple[Tuple | int, ...])

Returns a new view of the input tensor with singleton dimensions expanded to a larger size.

Passing -1 as the size for a dimension means not changing the size of that dimension.

A tensor can also be expanded to a larger number of dimensions. The new dimensions are appended at the front. For the new dimensions, the size cannot be set to -1.

Parameters:
• input – the input tensor.

• sizes – the desired expanded size.

Example:

>>> x = samba.SambaTensor(torch.tensor([[1], [2], [3]]))
>>> x.size()
torch.Size([3, 1])
>>> x.expand(3, 4).data
tensor([[ 1,  1,  1,  1],
[ 2,  2,  2,  2],
[ 3,  3,  3,  3]])
>>> x.expand(-1, 4).data   # -1 means not changing the size of that dimension
tensor([[ 1,  1,  1,  1],
[ 2,  2,  2,  2],
[ 3,  3,  3,  3]])


For details see torch.Tensor.expand().

flatten(input: SambaTensor, start_dim: int = 0, end_dim: int = -1)

Flattens input by reshaping it into a one-dimensional tensor. If start_dim or end_dim are passed, only dimensions starting with start_dim and ending with end_dim are flattened. The order of elements in input is unchanged.

Note

Flattening a zero-dimensional tensor will return a one-dimensional view.

Parameters:
• input – the input tensor.

• start_dim – the first dim to flatten

• end_dim – the last dim to flatten

Supported data types:

• input: torch.bfloat16, torch.float32

Example:

>>> t = samba.SambaTensor(torch.tensor([[[1, 2],
...                                      [3, 4]],
...                                     [[5, 6],
...                                      [7, 8]]])
>>> samba.flatten(t).data
tensor([1, 2, 3, 4, 5, 6, 7, 8])
>>> samba.flatten(t, start_dim=1).data
tensor([[1, 2, 3, 4],
[5, 6, 7, 8]])


For more details torch.flatten()

groupby(tensor: SambaTensor, num_bins: int = 32, capacity: int = 1) Tuple[SambaTensor, SambaTensor]

Computes the the correct bin for the input and generate its scatter address. The overflow bin for histogram represents trash bin

New in version 1.19.

\begin{align}\begin{aligned}\text{scatter_out}_i = \text{input}_i + (\text{capacity} * \text{tensor[i]})\\+ \text{histogram}_i \text{histogram_out}_i = \text{histogram(tensor[i])}_i + 1\end{aligned}\end{align}
Parameters:
• tensor – 2-dimensional tensor whose dimision is bs * ss

• num_bins – number of bins

• capacity – maximum number of tokens each bin can take

Example:

>>> tensor = samba.SambaTensor(torch.tensor([ 0,  1,  1, 0, 1, 2, 2]))
>>> [s, h] = samba.groupby(tensor, 3, 2, 1, True, False)
>>> s.data
[ 0,  2,  3, 1, 6, 4, 5]

>>> h.data
[ 2,  2,  2, 1]

index_select(input: SambaTensor, dim: int, index: torch.LongTensor | List[int], *, out: SambaTensor | None = None)

Returns a new tensor which indexes the input tensor along dimension dim using the entries in index (which is an integer tensor).

The returned tensor has the same number of dimensions as the original tensor (input). The dimth dimension has the same size as the length of index; other dimensions have the same size as in the original tensor.

Note

The returned tensor does not use the same storage as the original tensor. If out has a different shape than expected, we silently change it to the correct shape, reallocating the underlying storage if necessary.

Parameters:
• input – the input tensor.

• dim – the dimension in which we index.

• index – the 1-D tensor containing the indices to index.

• out – the output tensor. Defaults to None.

Note

The PyTorch API optional keyword arg

• out (Tensor, optional)

is not supported on RDU and will throw an exception.

Example:

>>> x = samba.randn(3, 4)
>>> x.data
tensor([[ 0.1427,  0.0231, -0.5414, -1.0009],
[-0.4664,  0.2647, -0.1228, -1.1068],
[-1.1734, -0.6571,  0.7230, -0.6004]])
>>> indices = samba.SambaTensor(torch.tensor([0, 2]))
>>> samba.index_select(x, 0, indices).data
tensor([[ 0.1427,  0.0231, -0.5414, -1.0009],
[-1.1734, -0.6571,  0.7230, -0.6004]])
>>> samba.index_select(x, 1, indices).data
tensor([[ 0.1427, -0.5414],
[-0.4664, -0.1228],
[-1.1734,  0.7230]])


For details see torch.index_select().

permute(input: SambaTensor, dims: Tuple[int])

Returns a view of the original tensor input with its dimensions permuted.

Parameters:
• input – the input tensor.

• dims – the desired ordering of dimensions.

Supported data types:

• input: torch.bfloat16, torch.float32

Example:

>>> x = samba.randn(2, 3, 5)
>>> x.size()
torch.Size([2, 3, 5])
>>> samba.permute(x, (2, 0, 1)).size()
torch.Size([5, 2, 3])


For more details see torch.permute()

reshape(input: SambaTensor, shape: Tuple[int])

Returns a tensor with the same data and number of elements as input, but with the specified shape. When possible, the returned tensor will be a view of input. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.

See view on when it is possible to return a view.

A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in input.

Parameters:
• input – the tensor to be reshaped

• shape – the new shape

Example:

>>> a = samba.arange(4.)
>>> samba.reshape(a, (2, 2)).data
tensor([[ 0.,  1.],
[ 2.,  3.]])
>>> b = samba.SambaTensor(torch.tensor([[0, 1], [2, 3]]))
>>> samba.reshape(b, (-1,)).data
tensor([ 0,  1,  2,  3])


For more details torch.reshape()

scaled_dot_product_attention(query: SambaTensor, key: SambaTensor, value: SambaTensor, attn_mask: SambaTensor, dropout_p: float, is_causal: bool)

New in version 1.18.

Computes scaled dot product attention (SDPA) using query, key, and value tensors, with optional attention masking and dropout. This function is designed for compatibility with both CPU and RDU environments. It supports 'math_sdp' and 'seg_softmax_sdp' implementations. 'math_sdp' is the original algebraic version of the operator. 'seg_softmax_sdp', or segmented softmax attention, is the version specialized for RDU computation, developed based on flash attention.

Parameters:
• query – the query tensor.

• key – the key tensor.

• value – the value tensor.

• attn_mask – optional attention mask. If provided, it is applied to the attention weights.

• dropout_p – dropout probability for the attention weights. If 0.0, no dropout is applied.

• is_causal – if True, applies causal masking to prevent the attention mechanism from peeking into the future. Cannot be used with attn_mask.

Notes

• 'seg_softmax_sdp' is a variant of SDPA optimized for RDU based on flash attention, offering enhanced speed and memory efficiency.

• 'math_sdp' is the algebraic implementation of SDPA, providing a hardware-neutral fallback for all PyTorch platforms.

• sambaflow automatically selects the best implementation based on the input, but users can override this using context managers. See the examples below.

Context Directives:
• 'disable_segmented_softmax_sdp' will force the operator to use 'math_sdp'.

• 'sdp_mixed_p' is for mixed precision to achieve higher accuracy without full fp32 precision.

• 'sdp_sliding_window_size' is a tuple of integers for sliding window when using 'seg_softmax_sdp', (left, right). Token i will attend to tokens [i - left, i + right] inclusive. Negative value indicates unlimited attention window.

• 'sdp_block_size' controls the block size of the used in segmented softmax attention. Does not affect functionality.

Restrictions:

• attn_mask and sliding_window cannot be set at the same time. If needed, please include sliding window in the attn_mask.

Examples:

>>> # To force 'math_sdp' mode for debugging:
>>> with samba.directives.sdpa_directives({'disable_segmented_softmax_sdp': True}):
>>>     result = F.scaled_dot_product_attention(query,
...                                             key,
...                                             value,
...                                             dropout_p=dropout_p,
...                                             is_causal=is_causal)
>>> # To enable mixed precision for higher accuracy without full fp32 precision:
>>> with samba.directives.sdpa_directives({'sdp_mixed_p': True}):
>>>     result = F.scaled_dot_product_attention(query,
>>>                                             key,
>>>                                             value,
>>>                                             dropout_p=dropout_p,
>>>                                             is_causal=is_causal)
>>> # To set sliding window size:
>>>     result = F.scaled_dot_product_attention(query,
...                                             key,
...                                             value,
...                                             dropout_p=dropout_p,
...                                             is_causal=is_causal)
>>> # To set sdp block size:
>>>     result = F.scaled_dot_product_attention(query,
...                                             key,
...                                             value,
...                                             dropout_p=dropout_p,
...                                             is_causal=is_causal)

sn_identity(ipt: SambaTensor)

New in version 1.18.

Returns the input tensor as is, without any modifications. This function ensures that the input tensor is passed through without any alterations. It acts as a passthrough, similar to torch.nn.Identity.

Parameters:

ipt – the SambaTensor to pass through the function without modification.

Example

>>> from sambaflow.samba.functional import sn_identity
>>> # In this example, the tensor x is created with the values [1, 2, 3].
>>> # sn_identity is applied to x, and as a passthrough function, it returns a tensor y identical to x.
>>> x = samba.SambaTensor(torch.tensor([1, 2, 3]))
>>> y = sn_identity(x)
>>> y.data
tensor([1, 2, 3])


For more details see torch.nn.Identity.

split(tensor: SambaTensor, split_size_or_sections: int | List[int], dim: int = 0) Tuple[SambaTensor, ...]

Splits the tensor into chunks. Each chunk is a view of the original tensor.

If split_size_or_sections is an integer type, then tensor will be split into equally sized chunks (if possible). Last chunk will be smaller if the tensor size along the given dimension dim is not divisible by split_size.

If split_size_or_sections is a list, then tensor will be split into len(split_size_or_sections) chunks with sizes in dim according to split_size_or_sections.

Parameters:
• tensor – tensor to split.

• split_size_or_sections – size of a single chunk or list of sizes for each chunk.

• dim – dimension along which to split the tensor.

Supported data types:

• tensor: torch.bfloat16, torch.float32

Example:

>>> a = samba.arange(10).reshape(5,2)
>>> a.data
tensor([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> for output in samba.split(a, 2): output.data
tensor([[0, 1],
[2, 3]])
tensor([[4, 5],
[6, 7]])
tensor([[8, 9]])
>>> for output in samba.split(a, [1,4]): output.data
tensor([[0, 1]])
tensor([[2, 3],
[4, 5],
[6, 7],
[8, 9]])


For more details torch.split()

squeeze(input: SambaTensor, dim: int | None = None)

Returns a tensor with all the dimensions of input of size 1 removed.

For example, if input is of shape: $$(A \times 1 \times B \times C \times 1 \times D)$$ then the result will be of shape: $$(A \times B \times C \times D)$$.

When dim is given, a squeeze operation is done only in the given dimension. If input is of shape: $$(A \times 1 \times B)$$, squeeze(input, 0) leaves the tensor unchanged, but squeeze(input, 1) will squeeze the tensor to the shape $$(A \times B)$$.

Note

The returned tensor shares the storage with the input tensor, so changing the contents of one will change the contents of the other.

Warning

If the tensor has a batch dimension of size 1, then squeeze(input) will also remove the batch dimension, which can lead to unexpected errors.

Parameters:
• input – the input tensor.

• dim – if given, the input will be squeezed only in this dimension

Example:

>>> x = samba.zeros(2, 1, 2, 1, 2)
>>> x.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = samba.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])
>>> y = samba.squeeze(x, 0)
>>> y.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = samba.squeeze(x, 1)
>>> y.size()
torch.Size([2, 2, 1, 2])


For more details torch.squeeze()

stack(tensors: List[SambaTensor] | Tuple[SambaTensor, ...], dim: int = 0, *, out: SambaTensor | None = None)

Concatenates a sequence of tensors along a new dimension.

All tensors need to be of the same size.

Parameters:
• tensors – sequence of tensors to concatenate

• dim – dimension to insert. Has to be between 0 and the number of dimensions of concatenated tensors (inclusive)

• out – the output tensor. Defaults to None.

Supported data types:

• tensors: torch.bfloat16

Note

The PyTorch API optional keyword arg out is not supported on RDU and will throw an error.

For more details torch.stack()

to(input: SambaTensor, *args, **kwargs)

New in version 1.18.

Performs Tensor dtype and/or device conversion. A torch.dtype and torch.device are inferred from the arguments of self.to(*args, **kwargs).

Note

If the self Tensor already has the correct torch.dtype and torch.device, then self is returned. Otherwise, the returned tensor is a copy of self with the desired torch.dtype and torch.device.

Here are the ways to call to:

to(dtype: torch.dtype)

Returns a SambaTensor with the specified dtype.

to(other: SambaTensor) SambaTensor:

Returns a SambaTensor with the same torch.dtype and torch.device as the SambaTensor other.

Example:

>>> tensor = samba.randn(2, 2)  # Initially dtype=float32
>>> tensor.to(torch.float64).data
tensor([[-0.5044,  0.0005],
[ 0.3310, -0.0584]]) # dtype=torch.float64

>>> other = samba.randn((), dtype=torch.float64)
>>> tensor.to(other).data
tensor([[-0.5044,  0.0005],
[ 0.3310, -0.0584]]) # dtype=torch.float64


For more details torch.Tensor.to().

transpose(input: SambaTensor, dim0: int, dim1: int)

Returns a tensor that is a transposed version of input. The given dimensions dim0 and dim1 are swapped.

Parameters:
• input – the input tensor.

• dim0 – the first dimension to be transposed.

• dim1 – the second dimension to be transposed.

Example:

>>> samba.set_seed(1)
>>> x = samba.randn(2, 3)
>>> x.data
tensor([[ 0.6614,  0.2669,  0.0617],
[ 0.6213, -0.4519, -0.1661]])
>>> samba.transpose(x, 0, 1).data
tensor([[ 0.6614,  0.6213],
[ 0.2669, -0.4519],
[ 0.0617, -0.1661]])


For more details torch.transpose()

type_as(input: SambaTensor, tensor: SambaTensor)

New in version 1.18.

Returns this tensor cast to the type of the given tensor.

This is a no-op if the tensor is already of the correct type. This is equivalent to self.type(tensor.type()).

Parameters:
• input – the input tensor.

• tensor – the tensor which has the desired type

For more details torch.Tensor.type_as().

unsqueeze(input: SambaTensor, dim: int)

Returns a new tensor with a dimension of size one inserted at the specified position.

The returned tensor shares the same underlying data with this tensor.

A dim value within the range [-input.dim() - 1, input.dim() + 1) can be used. Negative dim will correspond to unsqueeze() applied at dim = dim + input.dim() + 1.

Parameters:
• input – the input tensor.

• dim – the index at which to insert the singleton dimension

Supported data types:

• input: torch.bfloat16, torch.float32

Example:

>>> x = samba.SambaTensor(torch.tensor([1, 2, 3, 4]))
>>> samba.unsqueeze(x, 0).data
tensor([[ 1,  2,  3,  4]])
>>> samba.unsqueeze(x, 1).data
tensor([[ 1],
[ 2],
[ 3],
[ 4]])


For more details torch.unsqueeze()

view(input: SambaTensor, *shape: Tuple[int, ...])

Returns a new tensor with the same data as the input tensor but with a different shape.

The returned tensor shares the same data and must have the same number of elements, but may have a different size. For a tensor to be viewed, the new view size must be compatible with its original size and stride, i.e., each new view dimension must either be a subspace of an original dimension, or only span across original dimensions $$d, d+1, \dots, d+k$$ that satisfy the following contiguity-like condition that $$\forall i = d, \dots, d+k-1$$,

$\text{stride}[i] = \text{stride}[i+1] \times \text{size}[i+1]$

Otherwise, it will not be possible to view self tensor as shape without copying it (e.g., via contiguous()). When it is unclear whether a view() can be performed, it is advisable to use reshape(), which returns a view if the shapes are compatible, and copies (equivalent to calling contiguous()) otherwise.

Note

In PyTorch, view supports type casting as well if given a torch.dtype as input. This is not currently supported in SambaFlow.

Parameters:
• input – the input tensor

• shape – the desired size

Supported data types:

• input: torch.bfloat16, torch.float32, torch.int16, torch.int32, torch.int64

Example:

>>> x = samba.randn(4, 4)
>>> x.size()
torch.Size([4, 4])
>>> y = x.view(16)
>>> y.size()
torch.Size([16])
>>> z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
>>> z.size()
torch.Size([2, 8])


For details, see torch.Tensor.view()