samba.functional¶
Arithmetic¶
- abs(input: SambaTensor, *, out: SambaTensor | None = None) SambaTensor ¶
New in version 1.18.
Computes the absolute value of each element in
input
.\[\text{out}_{i} = |\text{input}_{i}|\]- Parameters:
input – the input tensor.
out – the output tensor. Defaults to
None
.
Note
The Pytorch keyword arg
out
is not supported on RDU and will throw an exception.Example:
>>> x = samba.SambaTensor(torch.tensor([-1, -2, -3])) >>> samba.abs(x).data tensor([ 1, 2, 3])
See also
For more details
torch.abs()
- add(input: SambaTensor, other: SambaTensor, alpha: int | float = 1, out: SambaTensor | None = None) SambaTensor ¶
Computes the element-wise sum of the given
input
andother
tensors\[\text{out}_i = \text{input}_i + \text{other}_i\]- Parameters:
input – the input tensor.
other – the tensor or number to add to input.
alpha – the multiplier for
other
. Defaults to1.0
.out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword args
alpha (Number)
out (Tensor, optional)
are supported only on CPU. They are not supported on RDU.
Example:
>>> a = samba.SambaTensor(torch.tensor([ 0.0202, 1.0985, 1.3506, -0.6056])) >>> samba.add(a, 20).data tensor([20.0202, 21.0985, 21.3506, 19.3944]) >>> b = samba.SambaTensor(torch.tensor([-0.9732, -0.3497, 0.6245, 0.4022])) >>> c = samba.SambaTensor(torch.tensor([[ 0.3743], ... [-1.7724], ... [-0.5811], ... [-0.8017]])) >>> (b + c).data tensor([[-0.5989, 0.0246, 0.9988, 0.7765], [-2.7456, -2.1221, -1.1479, -1.3702], [-1.5543, -0.9308, 0.0434, -0.1789], [-1.7749, -1.1514, -0.1772, -0.3995]])
See also
For details see
torch.add()
.
- div(input: SambaTensor, other: SambaTensor | int | float, rounding_mode: str | None = None, out: SambaTensor | None = None) SambaTensor ¶
Divides each element of the
input
by the corresponding element ofother
.\[\text{out}_i = \frac{\text{input}_i}{\text{other}_i}\]- Parameters:
input – the dividend.
other – the divisor.
rounding_mode – type of rounding mode applied to the result. Either
None
,"trunc"
, or"floor"
.out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword args
rounding_mode (str, optional)
out (Tensor, optional)
are not supported on RDU and will throw an exception.
Examples:
>>> x = samba.SambaTensor(torch.tensor([ 0.3810, 1.2774, -0.2972, -0.3719, 0.4637])) >>> samba.div(x, 0.5).data tensor([ 0.7620, 2.5548, -0.5944, -0.7438, 0.9274]) >>> a = samba.SambaTensor(torch.tensor([[-0.3711, -1.9353, -0.4605, -0.2917], ... [ 0.1815, -1.0111, 0.9805, -1.5923], ... [ 0.1062, 1.4581, 0.7759, -1.2344], ... [-0.1830, -0.0313, 1.1908, -1.4757]])) >>> b = samba.SambaTensor(torch.tensor([ 0.8032, 0.2930, -0.8113, -0.2308])) >>> samba.div(a, b).data tensor([[-0.4620, -6.6051, 0.5676, 1.2639], [ 0.2260, -3.4509, -1.2086, 6.8990], [ 0.1322, 4.9764, -0.9564, 5.3484], [-0.2278, -0.1068, -1.4678, 6.3938]])
See also
For details, see
torch.div()
.
- fmod(input: SambaTensor, other: 'SambaTensor' | float | int, *, out: SambaTensor | None = None) SambaTensor ¶
New in version 1.19.
Computes the element-wise modulus of the given
input
andother
tensors.The result has the same sign as the dividend
input
and its absolute value is less than that ofother
.It’s equivalent to:
>>> input - input.div(other, rounding_mode="trunc") * other
- Parameters:
input – the dividend
other – the divisor
out – the output tensor. Defaults to
None
.
Supported data types:
input: torch.bfloat16, torch.float32, torch.int16, torch.int32
other: torch.bfloat16, torch.float32, torch.int16, torch.int32, int, float
Note
The PyTorch API optional keyword arg
out
is only supported on CPU, it is not supported on RDU, and will throw an error.Example:
>>> a = samba.SambaTensor(torch.tensor([-3., -2, -1, 1, 2, 3])) >>> samba.fmod(a, 2).data tensor([-1., -0., -1., 1., 0., 1.])
See also
For more details see
torch.fmod()
- gelu(input: SambaTensor, approximate: str = 'none') SambaTensor ¶
Applies the Gaussian Error Linear Units function:
\[\text{GELU}(x) = x * \Phi(x)\]where \(\Phi(x)\) is the Cumulative Distribution Function for Gaussian Distribution.
When the approximate argument is
'tanh'
, Gelu is estimated with:\[\text{GELU}(x) = 0.5 * x * (1 + \text{Tanh}(\sqrt{2 / \pi} * (x + 0.044715 * x^3)))\]- Parameters:
approximate – the gelu approximation algorithm to use:
'none'
|'tanh'
. Default:'none'
.
Supported data types:
input: torch.bfloat16, torch.float32
See also
For details see
torch.nn.functional.gelu()
.
- mul(input: SambaTensor, other: SambaTensor | int | float, *, out: SambaTensor | None = None) SambaTensor ¶
New in version 1.18.
Computes the element-wise multiplication of the given
input
tensor withother
.other
can be a either scalar or a tensor.\[\text{out}_i = \text{input}_i \times \text{other}_i\]- Parameters:
input – the input tensor.
other – the second input tensor or number.
out – the output tensor. Defaults to
None
.
Note
The Pytorch keyword arg
out
is not supported on RDU and will throw an exception.See also
For more details see
torch.mul()
.
- neg(input: SambaTensor, *, out: SambaTensor | None = None) SambaTensor ¶
Returns a new tensor with the negative of the elements of
input
tensor.\[\text{out} = -1 \times \text{input}\]- Parameters:
input – the input tensor.
out – the output tensor. Defaults to
None
.
Supported data types:
input: torch.bfloat16, torch.float32
Note
The Pytorch keyword arg
out
is not supported on RDU and will throw an exception.See also
For more details see
torch.neg()
.
- pow(input: SambaTensor, exponent: SambaTensor | int | float, *, out: SambaTensor | None = None) SambaTensor ¶
Takes the power of each element in input with exponent and returns a tensor with the result.
\[\text{out}_i = x_i ^ {\text{exponent}_i}\]- Parameters:
input – the input tensor.
exponent – the exponent value.
out – the output tensor. Defaults to
None
.
Note
The Pytorch keyword arg
out
is not supported on RDU and will throw an exception.See also
For details, see
torch.pow()
.
- relu(input: SambaTensor, inplace: bool = False) SambaTensor ¶
New in version 1.18.
Applies the rectified linear unit function element-wise.
\[\text{ReLU}(x) = (x)^+ = \max(0, x)\]- Parameters:
input – the input tensor.
inplace – If set to
True
, will do this operation in-place.
Note
The PyTorch API optional keyword arg
inplace
is only supported on CPU, it is not supported on RDU, and will log a warning.See also
For details see
torch.nn.functional.relu()
.
- remainder(input: SambaTensor, other: 'SambaTensor' | float | int, *, out: SambaTensor | None = None) SambaTensor ¶
New in version 1.19.
Computes the element-wise modulus of the given
input
andother
tensors.The result has the same sign as the divisor
other
and its absolute value is less than that ofother
.It’s equivalent to:
>>> input - input.div(other, rounding_mode="floor") * other
- Parameters:
input – the dividend
other – the divisor
out – the output tensor. Defaults to
None
.
Supported data types:
input: torch.int16, torch.int32
other: torch.int16, torch.int32, int
For floating point modulus operation see
samba.fmod()
Note
The PyTorch API optional keyword arg
out
is only supported on CPU, it is not supported on RDU, and will throw an error.Example:
>>> a = samba.SambaTensor(torch.tensor([-3., -2, -1, 1, 2, 3])) >>> samba.remainder(a, 2).data tensor([ 1., -0., 1., 1., 0., 1.])
See also
For more details see
torch.remainder()
- rsqrt(input: SambaTensor, *, out: SambaTensor | None = None) SambaTensor ¶
New in version 1.18.
Returns a new tensor with the reciprocal of the square-root of each of the elements of
input
.\[\text{out}_{i} = \frac{1}{\sqrt{\text{input}_{i}}}\]- Parameters:
input – the input tensor.
out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword arg
out
is only supported on CPU, it is not supported on RDU, and will throw an error.Example:
>>> a = samba.randn(4) >>> a.data tensor([-0.0370, 0.2970, 1.5420, -0.9105]) >>> samba.rsqrt(a).data tensor([nan, 1.8351, 0.8053, nan])
See also
For details see
torch.rsqrt()
.
- rsub(input: SambaTensor, other: SambaTensor | int | float, alpha: int | float = 1) SambaTensor ¶
Performs reverse subtraction, where the operands are swapped.
\[\text{{out}}_i = \text{{other}}_i - \text{{alpha}} \times \text{{input}}_i\]- Parameters:
input – subtrahend tensor.
other – minuend tensor.
alpha – the multiplier for
input
.
Note
The PyTorch API optional keyword arg
alpha (number)
is not supported on RDU and will throw an exception
- scale(input: SambaTensor, value: float | SambaTensor) SambaTensor ¶
New in version 1.18.
Multiplies each element of
input
byvalue
.\[\text{out}_{i} = \text{value} * \text{input}_{i}\]- Parameters:
input – the input tensor.
value – the value to multiply by.
Example
>>> samba.set_seed(1) >>> x = samba.randn(3,4) >>> x.data tensor([[ 0.6614, 0.2669, 0.0617, 0.6213], [-0.4519, -0.1661, -1.5228, 0.3817], [-1.0276, -0.5631, -0.8923, -0.0583]]) >>> samba.scale(x, -1).data tensor([[-0.6614, -0.2669, -0.0617, -0.6213], [ 0.4519, 0.1661, 1.5228, -0.3817], [ 1.0276, 0.5631, 0.8923, 0.0583]])
- sigmoid(input: SambaTensor, *, out: SambaTensor | None = None) SambaTensor ¶
New in version 1.18.
Computes the expit (also known as the logistic sigmoid function) of the elements of
input
.\[\text{out}_{i} = \frac{1}{1 + e^{-\text{input}_{i}}}\]- Parameters:
input – the input tensor.
out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword arg
out
is not supported on RDU and will throw an error.Example:
>>> a = samba.SambaTensor(torch.randn(4)) >>> a.data tensor([ 0.9213, 1.0887, -0.8858, -1.7683]) >>> samba.sigmoid(a).data tensor([ 0.7153, 0.7481, 0.2920, 0.1458])
See also
For more details see
torch.nn.functional.sigmoid()
.
- silu(input: SambaTensor, inplace: bool = False) SambaTensor ¶
New in version 1.18.
Applies the Sigmoid Linear Unit (SiLU) function, element-wise. The SiLU function is also known as the swish function.
\[\text{silu}(x) = x * \sigma(x), \text{where } \sigma(x) \text{ is the logistic sigmoid.}\]- Parameters:
input – tensor to perform the operation.
inplace – If set to
True
, will do this operation in-place.
Note
The PyTorch API optional keyword arg
inplace
is not supported on RDU and will throw an error.See also
For more details see
torch.nn.functional.silu()
- softmax(input: SambaTensor, dim: int = None, _stacklevel: int = 3, dtype: torch.dtype | None = None) SambaTensor ¶
Applies a softmax function.
Softmax is defined as:
\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]It is applied to all slices along
dim
, and will re-scale them so that the elements lie in the range [0, 1] and sum to 1.\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]- Parameters:
input – the input tensor.
dim – a dimension along which softmax will be computed.
dtype – the desired data type of returned tensor. If specified, the input tensor is cast to
dtype
before the operation is performed. This is useful for preventing data type overflows. Default:None
.
See also
For more details
torch.nn.functional.softmax()
.
- sqrt(input: SambaTensor, out: SambaTensor | None = None) SambaTensor ¶
Returns a new tensor with the square-root of the elements of
input
.\[\text{out}_{i} = \sqrt{\text{input}_{i}}\]- Parameters:
input – the input tensor.
out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword arg
out
is not supported on RDU and will throw an error.Example:
>>> a = samba.SambaTensor(torch.tensor([-2.0755, 1.0226, 0.0831, 0.4806])) >>> samba.sqrt(a).data tensor([nan, 1.0112, 0.2883, 0.6933])
See also
For more details
torch.sqrt()
.
- sub(input: SambaTensor, other: SambaTensor | float | int, *, alpha: float | int = 1, out: SambaTensor | None = None) SambaTensor ¶
Subtracts
other
, scaled byalpha
, frominput
.\[\text{{out}}_i = \text{{input}}_i - \text{{alpha}} \times \text{{other}}_i\]- Parameters:
input – the input tensor.
other – the tensor or scalar to subtract from
input
alpha – the scalar multiplier for
other
.out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword args
out (Tensor, optional)
alpha (Scalar)
are only supported on CPU, they are not supported on RDU.
Example:
>>> a = samba.SambaTensor(torch.tensor((1, 2))) >>> b = samba.SambaTensor(torch.tensor((0, 1))) >>> samba.sub(a, b).data tensor([1, 1])
See also
For more details
torch.sub()
.
- tanh(input: SambaTensor, *, out: SambaTensor | None = None) SambaTensor ¶
Returns a new tensor with the hyperbolic tangent of the elements of
input
.\[\text{out}_{i} = \tanh(\text{input}_{i})\]- Parameters:
input – the input tensor.
out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword args
out (Tensor, optional)
are only supported on CPU, they are not supported on RDU and will throw an exception.
Example:
>>> a = samba.SambaTensor(torch.tensor([0.8986, -0.7279, 1.1745, 0.2611], dtype=torch.bfloat16)) >>> samba.tanh(a).data tensor([ 0.7148, -0.6211, 0.8242, 0.2559], dtype=torch.bfloat16)
See also
For more details
torch.tanh()
.
Generator¶
- masked_fill(input: SambaTensor, mask: SambaTensor, value: float) SambaTensor ¶
Fills elements of
input
tensor withvalue
wheremask
isTrue
. The shape ofmask
must be broadcastable with the shape ofinput
.- Parameters:
input – the input tensor
mask – the boolean mask
value – the value to fill in with
Note
Only 2 dimensional inputs/masks are supported right now
See also
For more details
torch.Tensor.masked_fill()
- masked_fill_(input: SambaTensor, mask: SambaTensor, value: float)¶
Fills elements of
self
tensor withvalue
wheremask
isTrue
. The shape ofmask
must be broadcastable with the shape of the underlying tensor. The operation is done inplace.- Parameters:
input – the input SambaTensor
mask – the boolean mask
value – the value to fill in with
See also
For more details
torch.Tensor.masked_fill_()
- triu_fill(input: SambaTensor | Tensor, value: int | float) SambaTensor | Tensor ¶
Out-of-place version of
triu_fill_()
.
- triu_fill_(input: SambaTensor, value: int | float) SambaTensor ¶
Fills the upper triangle of the last 2D-dimension of
input
withvalue
in-place. Does not fill the diagonal itself.input
’s two inner-most dimensions must be a square matrix.- Parameters:
input – the SambaTensor to fill
value – the value to fill with
Example:
>>> x = samba.randn(3,3) >>> x.data tensor([[ 1.3290, -0.9150, -0.1482], [ 0.4660, -0.9847, -0.7689], [-1.1259, -0.9790, -0.3892]]) >>> samba.triu_fill_(x, 14) >>> x.data tensor([[ 1.3290, 14.0000, 14.0000], [ 0.4660, -0.9847, 14.0000], [-1.1259, -0.9790, -0.3892]])
Logical¶
- bitwise_not(input: SambaTensor, *, out: SambaTensor | None = None) SambaTensor ¶
Computes the bitwise NOT of the given input tensor. The input tensor must be an
int
orbool
type. For bool tensors, it computes the logical NOT.- Parameters:
input – the input tensor.
out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword arg
out (Tensor, optional)
is not supported on RDU and will throw an exception
Note
bitwise_not on RDU only supports bool dtype for
input
.Example
>>> samba.bitwise_not(samba.SambaTensor(torch.tensor([True, True, False], dtype=torch.bool))).data tensor([False, False, True])
See also
For more details
torch.bitwise_not()
.
- logical_or(input: SambaTensor, other: SambaTensor, *, out: SambaTensor | None = None) Tensor ¶
Computes the element-wise logical OR of the given input tensors. Zeros are treated as
False
and nonzeros are treated asTrue
.- Parameters:
input – the input tensor.
other – the tensor to compute OR with
out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword args
out (Tensor, optional)
are not supported on RDU and will throw an exception
Example:
>>> samba.logical_or(samba.SambaTensor(torch.tensor([True, False, True])), samba.SambaTensor(torch.tensor([True, False, False]))).data tensor([ True, False, True]) >>> a = samba.SambaTensor(torch.tensor([0, 1, 10, 0], dtype=torch.int8)) >>> b = samba.SambaTensor(torch.tensor([4, 0, 1, 0], dtype=torch.int8)) >>> samba.logical_or(a, b).data tensor([ True, True, True, False]) >>> samba.logical_or(a.double(), b.double()).data tensor([ True, True, True, False]) >>> samba.logical_or(a.double(), b).data tensor([ True, True, True, False])
See also
For more details
torch.logical_or()
Loss¶
- cross_entropy(input: SambaTensor, target: SambaTensor, weight: SambaTensor | None = None, size_average: bool | None = None, ignore_index: int = -100, reduce: bool | None = None, reduction: str = 'mean', label_smoothing: float = 0.0) SambaTensor ¶
This criterion computes the cross entropy loss between input and target.
See
CrossEntropyLoss
for details.- Parameters:
input – \((N, C)\) where C = number of classes or \((N, C, H, W)\) in case of 2D Loss, or \((N, C, d_1, d_2, ..., d_K)\) where \(K \geq 1\) in the case of K-dimensional loss. input is expected to contain unnormalized scores (often referred to as logits).
target – If containing class indices, shape \((N)\) where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss. If containing class probabilities, same shape as the input.
weight – a manual rescaling weight given to each class. If given, has to be a tensor of size C.
size_average – Deprecated (see
reduction
).ignore_index – Specifies a target value that is ignored and does not contribute to the input gradient. When
size_average
isTrue
, the loss is averaged over non-ignored targets. Note thatignore_index
is only applicable when the target contains class indices. Default: -100reduce – Deprecated (see
reduction
).reduction – Specifies the reduction to apply to the output:
none
|mean
|sum
.none
: no reduction will be applied,mean
: the sum of the output will be divided by the number of elements in the output,sum
: the output will be summed. Note:size_average
andreduce
are being deprecated. Currently, specifying either of those args overridesreduction
. Default:mean
label_smoothing – A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss, where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in the external article Rethinking the Inception Architecture for Computer Vision. Default: \(0.0\).
Supported data types:
input: torch.bfloat16, torch.float32
target: torch.int32, torch.int64
Note
The PyTorch API optional keyword args
reduce (bool, optional)
size_average (bool, optional)
are not supported on RDU and will throw an exception
Examples:
>>> # Example of target with class indices >>> input = samba.randn(3, 5, requires_grad=True) >>> target = samba.randint(5, (3,), dtype=torch.int64) >>> loss = samba.cross_entropy(input, target)
See also
For details, see
torch.nn.functional.cross_entropy()
Modules¶
- multi_head_attention(query: SambaTensor, key: SambaTensor, value: SambaTensor, embed_dim_to_check: int, num_heads: int, in_proj_weight: SambaTensor | None, in_proj_bias: SambaTensor | None, bias_k: SambaTensor | None, bias_v: SambaTensor | None, add_zero_attn: bool, dropout_p: float, out_proj_weight: SambaTensor, out_proj_bias: SambaTensor, training: bool = True, key_padding_mask: SambaTensor | None = None, need_weights: bool = True, attn_mask: SambaTensor | None = None, use_separate_proj_weight: bool = False, q_proj_weight: SambaTensor | None = None, k_proj_weight: SambaTensor | None = None, v_proj_weight: SambaTensor | None = None, static_k: SambaTensor | None = None, static_v: SambaTensor | None = None)¶
New in version 1.18.
Allows the model to jointly attend to information from different representation subspaces. See reference: Attention Is All You Need.
Forward pass implementation for MultiHeadAttention.
See also
For details see
torch.nn.MultiheadAttention
ortorch.nn.functional.multi_head_attention_forward()
.- Parameters:
query – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.
key – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.
value – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.
embed_dim_to_check – total dimension of the model.
num_heads – parallel attention heads.
in_proj_weight – input projection weight and bias. Required if use_separate_proj_weight is
False
.in_proj_bias – input projection weight and bias. Required if use_separate_proj_weight is
False
.bias_k – bias of the key and value sequences to be added at dim=0.
bias_v – bias of the key and value sequences to be added at dim=0.
add_zero_attn – add a new batch of zeros to the key and value sequences at dim=1.
dropout_p – probability of an element to be zeroed.
out_proj_weight – the output projection weight and bias.
out_proj_bias – the output projection weight and bias.
training – apply dropout if is
True
.key_padding_mask – if provided, specified padding elements in the key will be ignored by the attention. This is an binary mask. When the value is
True
, the corresponding value on the attention layer will be filled with -inf.need_weights – output attn_output_weights. Default:
True
attn_mask – 2D or 3D mask that prevents attention to certain positions. A 2D mask will be broadcasted for all the batches while a 3D mask allows to specify a different mask for the entries of each batch.
is_causal – If specified, applies a causal mask as attention mask, and ignores attn_mask for computing scaled dot product attention. Default:
False
.use_separate_proj_weight – the function accept the proj. weights for query, key, and value in different forms. If false, in_proj_weight will be used, which is a combination of q_proj_weight, k_proj_weight, v_proj_weight.
q_proj_weight – input projection weight and bias.
k_proj_weight – input projection weight and bias.
v_proj_weight – input projection weight and bias.
in_proj_bias – input projection weight and bias.
static_k – static key and value used for attention operators.
static_v – static key and value used for attention operators.
average_attn_weights – If
True
, indicates that the returnedattn_weights
should be averaged across heads. Otherwise,attn_weights
are provided separately per head. Note that this flag only has an effect whenneed_weights=True.
. Default:True
Note
The PyTorch API keyword args
value (SambaTensor)
embed_dim_to_check (int)
bias_k (SambaTensor, optional)
bias_v (SambaTensor, optional)
add_zero_attn (bool)
training (bool)
key_padding_mask (SambaTensor, optional)
is_causal (bool)
static_k (SambaTensor, optional)
static_v (SambaTensor, optional)
average_attn_weights (bool)
are only supported on CPU, they are not supported on RDU.
- class FlashFFTConv(Nx: List[int], dtype: dtype, prefix: str = '')¶
New in version 1.19.
FlashFFTConv module. Effectively computes a zero-padded convolution of the input tensor and the kernel tensor. See the forward function for more info on the input tensors. See FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores for more info on the logic of the module.
- Parameters:
Nx – list of input dimensions. The length of
Nx
is thep
parameter which governs how many piecesinto (we split the sequence length) –
DFT. (Nx will be the dimension of the) –
dtype – either
torch.bfloat16
ortorch.float32
.prefix – an optional string prefix to prepend to the operator names.
Note
There are some constraints on
Nx
:We support
p = 2, 3, 4
.The last element of
Nx
must be even. This is because we need to pad the raw input by 2x, so the latter half of the last dim ofNx
will be all zeros.Let
N
be the product of elements inNx
. Letipt
be the input tensor. ThenN
must equalipt.shape[-1] * 2
. The factor of 2 is due to the padding; we pad the raw input before applying DFTs to it.
Example
>>> import torch >>> import sambaflow.samba as samba >>> from sambaflow.samba.nn.flash_fft_conv import FlashFFTConv >>> batch_size = 2 >>> hidden_dim = 1 >>> sequence_length = 8 >>> ipt = torch.randn(batch_size, hidden_dim, sequence_length) >>> kernel = torch.ones(hidden_dim, sequence_length) >>> flash_fft_conv = FlashFFTConv([4, 2, 2], torch.float32) # 4 * 2 * 2 == sequence_length * 2 >>> samba.from_torch_model_(flash_fft_conv) >>> result = flash_fft_conv(ipt, kernel) # observe the all-ones filter convolution effect on ipt
Normalization¶
- layer_norm(input: SambaTensor, normalized_shape: List[int] | Tuple[int], weight: SambaTensor | None = None, bias: SambaTensor | None = None, eps: float = 1e-05) SambaTensor ¶
New in version 1.18.
Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated over the last D dimensions, where D is the dimension of
normalized_shape
. For example, ifnormalized_shape
is(3, 5)
(a 2-dimensional shape), the mean and standard-deviation are computed over the last 2 dimensions of the input (i.e.input.mean((-2, -1))
). \(\gamma\) and \(\beta\) are learnable affine transform parameters ofnormalized_shape
ifelementwise_affine
isTrue
. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).Note
Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the
affine
option, Layer Normalization applies per-element scale and bias withelementwise_affine
.Note
Layer_norm on RDU only supports
normalized_shape
of length 1 and whose element is the same as theinput
’s last dimension size.This layer uses statistics computed from input data in both training and evaluation modes.
- Parameters:
input – the input tensor.
normalized_shape –
input shape from an expected input of size
\[\begin{split}[* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \\ \text{normalized_shape}[-1]]\end{split}\]If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.
eps – a value added to the denominator for numerical stability. Default: 1e-5.
elementwise_affine – a boolean value that when set to
True
, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default:True
.
Example
>>> samba.set_seed(1) >>> ipt = samba.SambaTensor(torch.randn((4,5), dtype=torch.bfloat16)) >>> samba.layer_norm(ipt, [5]).data tensor([[ 0.8398, 1.0938, -0.3145, 0.1064, -1.7266], [-1.4922, 0.1562, -0.5938, 0.4355, 1.4922], [ 1.0391, 0.6211, -1.7188, -0.5156, 0.5742], [ 1.7891, -0.2773, -1.2578, -0.3711, 0.1128]], dtype=torch.bfloat16)
See also
For more details see
torch.nn.functional.layer_norm()
.
Reduce¶
- argmax(input: SambaTensor, dim: int, keepdim: bool = False) SambaTensor ¶
New in version 1.19.
Returns the index of maximum values of a tensor across a dimension.
See also
For more details see |torch argmax|_
- max(input: SambaTensor, dim: int, keepdim: bool = False, * out: Optional[Tuple[SambaTensor, SambaTensor]] = None) SambaTensor | Tuple[SambaTensor, SambaTensor] ¶
Returns a tuple
(values, indices)
wherevalues
is the maximum value of each row of theinput
tensor in the given dimensiondim
andindices
is the index location of each maximum value found (argmax).If
keepdim
isTrue
, the output tensors are of the same size asinput
except in the dimensiondim
where they are of size 1. Otherwise,dim
is squeezed (seetorch.squeeze()
), resulting in the output tensors having 1 fewer dimension thaninput
.- Parameters:
input – the input tensor.
dim – the dimension to reduce.
keepdim – specifies whether to retain
dim
in the output tensor. Default:False
.out – the output tensor. Defaults to
None
.
Supported data types:
input: torch.float32
Note
The Pytorch keyword arg
out
is not supported on RDU and will throw an exception.Example:
>>> a = samba.randn(4, 4) >>> a.data tensor([[-1.2360, -0.2942, -0.1222, 0.8475], [ 1.1949, -1.1127, -2.2379, -0.6702], [ 1.5717, -0.9207, 0.1297, -1.8768], [-0.6172, 1.0036, -0.6060, -0.2432]]) >>> samba.max(a, 1)[0].data tensor([0.8475, 1.1949, 1.5717, 1.0036]) >>> samba.max(a, 1)[1].data tensor([3, 0, 0, 1])
See also
For details see
torch.max()
- mean(input: SambaTensor, dim: List[int] | int = None, keepdim: bool = False, *, dtype: torch.dtype | None = None, out: SambaTensor | None = None) SambaTensor ¶
Returns the mean value of each row of the
input
tensor in the given dimensiondim
. Ifdim
is a list of dimensions, reduces over all of them.If
keepdim
isTrue
, the output tensor is of the same size asinput
except in the dimension(s)dim
where it is of size 1. Otherwise,dim
is squeezed (seetorch.squeeze()
), resulting in the output tensor having 1 (orlen(dim)
) fewer dimension(s).- Parameters:
input – the input tensor.
dim – the dimension or dimensions to reduce.
keepdim – specifies whether to retain
dim
in the output tensor.dtype – the desired data type of returned tensor. If specified, the input tensor is cast to
dtype
before the operation is performed. Specifying dtype is useful for preventing data type overflows. Defaults toNone
.out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword args
dtype (torch.dtype, optional)
out (Tensor, optional)
are not supported on RDU and will throw an exception.
Example:
>>> a = samba.randn(4, 4) >>> a.data tensor([[-0.3841, 0.6320, 0.4254, -0.7384], [-0.9644, 1.0131, -0.6549, -1.4279], [-0.2951, -1.3350, -0.7694, 0.5600], [ 1.0842, -0.9580, 0.3623, 0.2343]]) >>> samba.mean(a, 1).data tensor([-0.0163, -0.5085, -0.4599, 0.1807]) >>> samba.mean(a, 1, True).data tensor([[-0.0163], [-0.5085], [-0.4599], [ 0.1807]])
See also
For details, see
torch.mean()
Regularization¶
- dropout(input: SambaTensor, p: float = 0.5, training: bool = True, inplace: bool = False) SambaTensor ¶
During training, randomly zeroes some of the elements of the input tensor with probability
p
using samples from a Bernoulli distribution.- Parameters:
input – the input tensor.
p – probability of an element to be zeroed. Default: 0.5.
training – If set to
True
, applies dropout. Default:True
inplace – If set to
True
, performs this operation in-place. Default:False
Supported data types:
input: torch.bfloat16, torch.float32
Note
The Pytorch keyword argument
inplace
is supported on CPU but will have no effect on RDUSee also
For details see
torch.nn.functional.dropout()
.
- dropout2d(input: SambaTensor, p: float = 0.5, training: bool = True, inplace: bool = False) SambaTensor ¶
Randomly zeroes out entire channels. A channel is a 2D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 2D tensor \(\text{input}[i, j]\)) of the input tensor. Each channel will be zeroed out independently on every forward call with probability
p
using samples from a Bernoulli distribution.See
Dropout2d
for details.- Parameters:
input – the input tensor
p – probability of a channel to be zeroed. Default: 0.5
training – If set to
True
, applies dropout. Default:True
inplace – If set to
True
, performs this operation in-place. Default:False
Note
The Pytorch keyword argument
inplace
is supported on CPU but will have no effect on RDUSee also
For details see
torch.nn.functional.dropout2d()
.
- dropout3d(input: SambaTensor, p: float = 0.5, training: bool = True, inplace: bool = False) SambaTensor ¶
Randomly zeroes out entire channels. A channel is a 3D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 3D tensor \(\text{input}[i, j]\)) of the input tensor. Each channel will be zeroed out independently on every forward call with probability
p
using samples from a Bernoulli distribution.See
Dropout3d
for details.- Parameters:
input – the input tensor.
p – probability of a channel to be zeroed. Default: 0.5.
training – If set to
True
, applies dropout. Default:True
.inplace – If set to
True
, performs this operation in-place. Default:False
.
Note
The Pytorch keyword argument
inplace
is supported on CPU but will have no effect on RDUSee also
For details see
torch.nn.functional.dropout3d()
.
Tensor Arithmetic¶
- addmm(input: Tensor, mat1: Tensor, mat2: Tensor, *, beta: Number | None = 1, alpha: Number | None = 1, out: Tensor | None = None) Tensor ¶
Performs a matrix multiplication of the matrices
mat1
andmat2
. The matrixinput
is added to the final result.If
mat1
is a \((n \times m)\) tensor andmat2
is a \((m \times p)\) tensor, theninput
must be broadcastable with a \((n \times p)\) tensor andout
will be a \((n \times p)\) tensor.alpha
andbeta
are scaling factors on matrix-vector product betweenmat1
andmat2
and the added matrixinput
respectively.\[\text{out} = \beta\ \text{input} + \alpha\ (\text{mat1}_i \mathbin{@} \text{mat2}_i)\]If
beta
is 0, theninput
will be ignored, and nan and inf in it will not be propagated.For inputs of type FloatTensor or DoubleTensor, arguments
beta
andalpha
must be real numbers, otherwise they should be integers.This operator supports TensorFloat32.
- Parameters:
input – matrix to be added.
mat1 – the first matrix to be matrix multiplied.
mat2 – the second matrix to be matrix multiplied.
beta – multiplier for
input
(\(\beta\)).alpha – multiplier for \(mat1 @ mat2\) (\(\alpha\)).
out – the output tensor.
Example:
>>> M = samba.randn(2, 3) >>> mat1 = samba.randn(2, 3) >>> mat2 = samba.randn(3, 3) >>> samba.addmm(M, mat1, mat2).data tensor([[-4.8716, 1.4671, -1.3746], [ 0.7573, -3.9555, -2.8681]])
Supported data types:
input: torch.bfloat16, torch.float32
mat1: torch.bfloat16, torch.float32
mat2: torch.bfloat16, torch.float32
Note
The Pytorch keyword arg
out
is not supported on RDU and will throw an exception.Note
This operator works on RDU only if the inputs meet these limitations:
input needs to be (p,).
alpha needs to be 1.
beta needs to be 1.
mat2 is a (p x m) matrix if is_transposed == True otherwise is a (m x p) matrix.
mat1 can be a 3D tensor if one of the dimension is batch_dim.
mat2 cannot have a batch_dim.
Additionally, if is_transposed == False, then mat1 is an (n x m) tensor and mat2 is a (m x p) tensor.
See also
For details see
torch.addmm()
.
- bmm(input: SambaTensor, mat2: SambaTensor, *, out: SambaTensor | None = None) SambaTensor ¶
New in version 1.18.
Performs a batch matrix-matrix multiplication of matrices stored in
input
andmat2
.input
andmat2
must be 3-D tensors each containing the same number of matrices.If
input
is a \((b \times n \times m)\) tensor,mat2
is a \((b \times m \times p)\) tensor,out
will be a \((b \times n \times p)\) tensor.\[\text{out}_i = \text{input}_i \mathbin{@} \text{mat2}_i\]- Parameters:
input – the first batch of matrices to be multiplied
mat2 – the second batch of matrices to be multiplied
Note
The PyTorch API optional keyword args
out (Tensor, optional)
are not supported on RDU and will throw an exception
Example:
>>> input = samba.SambaTensor(torch.randn(10, 3, 4)) >>> mat2 = samba.SambaTensor(torch.randn(10, 4, 5)) >>> res = samba.bmm(input, mat2) >>> res.size() torch.Size([10, 3, 5])
See also
For more details
torch.bmm()
- cumsum(input: SambaTensor, dim: int, *, dtype: torch.dtype | None = None, out: SambaTensor | None = None) SambaTensor ¶
Returns the cumulative sum of elements of
input
in the dimensiondim
.For example, if
input
is a vector of size N, the result will also be a vector of size N, with elements.\[y_i = x_1 + x_2 + x_3 + \dots + x_i\]- Parameters:
input – the input tensor.
dim – the dimension to do the operation over
dtype – the desired data type of returned tensor.
out – the output tensor. Defaults to
None
.
Supported data types:
input: torch.bfloat16, torch.float32, torch.int64
Note
The PyTorch API optional keyword args
dtype (torch.dtype, optional)
out (SambaTensor, optional)
are not supported on RDU and will throw an exception
Example:
>>> a = samba.randn(10) >>> a.data tensor([-0.8286, -0.4890, 0.5155, 0.8443, 0.1865, -0.1752, -2.0595, 0.1850, -1.1571, -0.4243]) >>> samba.cumsum(a, dim=0).data tensor([-0.8286, -1.3175, -0.8020, 0.0423, 0.2289, 0.0537, -2.0058, -1.8209, -2.9780, -3.4022])
See also
For more details
torch.cumsum()
.
- linear(input: SambaTensor, weight: SambaTensor, bias: SambaTensor | None = None) SambaTensor ¶
Applies a linear transformation to the incoming data: \(y = xA^T + b\).
- Parameters:
input – the input tensor
x
.weight – the weight tensor
A
.bias – the bias tensor
b
.
Shape:
Input: \((N, *, in\_features)\) \(N\) is the batch size, * means any number of additional dimensions.
Weight: \((out\_features, in\_features)\).
Bias: \((out\_features)\).
Output: \((N, *, out\_features)\).
See also
For more details see
torch.nn.functional.linear()
.
- matmul(input: SambaTensor, other: SambaTensor, *, out: SambaTensor | None = None) SambaTensor ¶
Matrix product of two tensors.
The behavior depends on the dimensionality of the tensors as follows:
If both tensors are 1-dimensional, the dot product (scalar) is returned.
If both tensors are 2-dimensional, the matrix-matrix product is returned.
If the first tensor is 1-dimensional and the second tensor is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.
If the first tensor is 2-dimensional and the second tensor is 1-dimensional, the matrix-vector product is returned.
If both tensors are at least 1-dimensional and at least one tensor is N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first tensor is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second tensor is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are broadcasted (and thus must be broadcastable). For example, if
input
is a \((j \times 1 \times n \times n)\) tensor andother
is a \((k \times n \times n)\) tensor,out
will be a \((j \times k \times n \times n)\) tensor.
Note that the broadcasting logic only looks at the batch dimensions when determining if the inputs are broadcastable, and not at the matrix dimensions. For example, if
input
is a \((j \times 1 \times n \times m)\) tensor andother
is a \((k \times m \times p)\) tensor, these inputs are valid for broadcasting even though the final two dimensions (i.e. the matrix dimensions) are different.out
will be a \((j \times k \times n \times p)\) tensor.- Parameters:
input – the first tensor to be multiplied.
other – the second tensor to be multiplied.
out – the output tensor. Defaults to
None
.
Supported data types:
input: torch.bfloat16, torch.float32
other: torch.bfloat16, torch.float32
Note
The Pytorch keyword arg
out
is not supported on RDU and will throw an exception.See also
For more details see
torch.matmul()
.
Tensor Ops¶
- cat(tensors: List[SambaTensor] | Tuple[SambaTensor], dim: int | None = 0, axis: int | None = None, *, out: SambaTensor | None = None) SambaTensor ¶
Concatenates the given sequence of
seq
tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty.torch.cat()
can be seen as an inverse operation fortorch.split()
andtorch.chunk()
.- Parameters:
tensors – any Python sequence of tensors of the same type. Non-empty tensors must have the same shape, except in the cat dimension.
dim – the dimension over which the tensors are concatenated
axis – alias for
dim
, cannot be specified withdim
.out – the output tensor. Defaults to
None
.
Supported data types:
tensors: torch.bfloat16, torch.float32, torch.int16, torch.int32, torch.int64
Note
The Pytorch keyword arg
out
is not supported on RDU and will throw an exception.Example:
>>> x = samba.randn(2, 3) >>> x.data tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> samba.cat((x, x, x), 0).data tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> samba.cat((x, x, x), 1).data tensor([[ 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497]])
See also
For details, see
torch.cat()
- embedding(weight: SambaTensor, input: SambaTensor, padding_idx: int, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool, sparse: bool) SambaTensor ¶
A simple lookup table that looks up embeddings in a fixed dictionary and size.
This function is often used to retrieve word embeddings using indices. The input is a list of indices and the embedding matrix. The output is the corresponding word embeddings.
See
torch.nn.Embedding
for details.- Parameters:
input – tensor containing indices into the embedding matrix.
weight – the embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size.
padding_idx – if specified, the entries at
padding_idx
do not contribute to the gradient; therefore, the embedding vector atpadding_idx
is not updated during training, i.e. it remains as a fixed “pad”.max_norm – if given, each embedding vector with norm larger than
max_norm
is renormalized to have normmax_norm
. Note: this will modifyweight
in place.norm_type – The p of the p-norm to compute for the
max_norm
option. Default:2
.scale_grad_by_freq – If given, scales gradients by the inverse of frequency of the words in the mini-batch. Default:
False
.sparse – If
True
, gradient w.r.t.weight
will be a sparse tensor. See Notes undertorch.nn.Embedding
for details.
Shape:
Input: Integer tensor of arbitrary shape containing the indices to extract.
Weight: Embedding matrix of floating point type with shape \((V, embedding\_dim)\), where \(V = maximum\ index + 1\) and embedding_dim = the embedding size.
Output: \((*, embedding\_dim)\), where \(*\) is the input shape.
Supported data types:
input: torch.int32, torch.int64
weight: torch.bfloat16, torch.float32
Note
The PyTorch keyword arguments
max_norm
,norm_type
,scale_grad_by_freq
, andsparse
are supported on CPU but will have no effect on RDU.Examples:
>>> # a batch of 2 samples of 4 indices each >>> input = samba.SambaTensor(torch.tensor([[1,2,4,5],[4,3,2,9]])) >>> # an embedding matrix containing 10 tensors of size 3 >>> embedding_matrix = samba.rand(10, 3) >>> samba.embedding(input, embedding_matrix).data tensor([[[ 0.8490, 0.9625, 0.6753], [ 0.9666, 0.7761, 0.6108], [ 0.6246, 0.9751, 0.3618], [ 0.4161, 0.2419, 0.7383]], [[ 0.6246, 0.9751, 0.3618], [ 0.0237, 0.7794, 0.0528], [ 0.9666, 0.7761, 0.6108], [ 0.3385, 0.8612, 0.1867]]])
See also
For details, see
torch.nn.functional.embedding()
- expand(input: SambaTensor, *sizes: Tuple[Tuple | int, ...]) SambaTensor ¶
Returns a new view of the input tensor with singleton dimensions expanded to a larger size.
Passing -1 as the size for a dimension means not changing the size of that dimension.
A tensor can also be expanded to a larger number of dimensions. The new dimensions are appended at the front. For the new dimensions, the size cannot be set to -1.
- Parameters:
input – the input tensor.
sizes – the desired expanded size.
Example:
>>> x = samba.SambaTensor(torch.tensor([[1], [2], [3]])) >>> x.size() torch.Size([3, 1]) >>> x.expand(3, 4).data tensor([[ 1, 1, 1, 1], [ 2, 2, 2, 2], [ 3, 3, 3, 3]]) >>> x.expand(-1, 4).data # -1 means not changing the size of that dimension tensor([[ 1, 1, 1, 1], [ 2, 2, 2, 2], [ 3, 3, 3, 3]])
See also
For details see
torch.Tensor.expand()
.
- flatten(input: SambaTensor, start_dim: int = 0, end_dim: int = -1) SambaTensor ¶
Flattens
input
by reshaping it into a one-dimensional tensor. Ifstart_dim
orend_dim
are passed, only dimensions starting withstart_dim
and ending withend_dim
are flattened. The order of elements ininput
is unchanged.Note
Flattening a zero-dimensional tensor will return a one-dimensional view.
- Parameters:
input – the input tensor.
start_dim – the first dim to flatten
end_dim – the last dim to flatten
Supported data types:
input: torch.bfloat16, torch.float32
Example:
>>> t = samba.SambaTensor(torch.tensor([[[1, 2], ... [3, 4]], ... [[5, 6], ... [7, 8]]]) >>> samba.flatten(t).data tensor([1, 2, 3, 4, 5, 6, 7, 8]) >>> samba.flatten(t, start_dim=1).data tensor([[1, 2, 3, 4], [5, 6, 7, 8]])
See also
For more details
torch.flatten()
- groupby(tensor: SambaTensor, num_bins: int = 32, capacity: int = 1) Tuple[SambaTensor, SambaTensor] ¶
Computes the the correct bin for the input and generate its scatter address. The overflow bin for histogram represents trash bin
New in version 1.19.
\[ \begin{align}\begin{aligned}\text{scatter_out}_i = \text{input}_i + (\text{capacity} * \text{tensor[i]})\\+ \text{histogram}_i \text{histogram_out}_i = \text{histogram(tensor[i])}_i + 1\end{aligned}\end{align} \]- Parameters:
tensor – 2-dimensional tensor whose dimision is bs * ss
num_bins – number of bins
capacity – maximum number of tokens each bin can take
Example:
>>> tensor = samba.SambaTensor(torch.tensor([ 0, 1, 1, 0, 1, 2, 2])) >>> [s, h] = samba.groupby(tensor, 3, 2, 1, True, False) >>> s.data [ 0, 2, 3, 1, 6, 4, 5] >>> h.data [ 2, 2, 2, 1]
- index_select(input: SambaTensor, dim: int, index: torch.LongTensor | List[int], *, out: SambaTensor | None = None) SambaTensor ¶
Returns a new tensor which indexes the
input
tensor along dimensiondim
using the entries inindex
(which is an integer tensor).The returned tensor has the same number of dimensions as the original tensor (
input
). Thedim
th dimension has the same size as the length ofindex
; other dimensions have the same size as in the original tensor.Note
The returned tensor does not use the same storage as the original tensor. If
out
has a different shape than expected, we silently change it to the correct shape, reallocating the underlying storage if necessary.- Parameters:
input – the input tensor.
dim – the dimension in which we index.
index – the 1-D tensor containing the indices to index.
out – the output tensor. Defaults to
None
.
Note
The PyTorch API optional keyword arg
out (Tensor, optional)
is not supported on RDU and will throw an exception.
Example:
>>> x = samba.randn(3, 4) >>> x.data tensor([[ 0.1427, 0.0231, -0.5414, -1.0009], [-0.4664, 0.2647, -0.1228, -1.1068], [-1.1734, -0.6571, 0.7230, -0.6004]]) >>> indices = samba.SambaTensor(torch.tensor([0, 2])) >>> samba.index_select(x, 0, indices).data tensor([[ 0.1427, 0.0231, -0.5414, -1.0009], [-1.1734, -0.6571, 0.7230, -0.6004]]) >>> samba.index_select(x, 1, indices).data tensor([[ 0.1427, -0.5414], [-0.4664, -0.1228], [-1.1734, 0.7230]])
See also
For details see
torch.index_select()
.
- permute(input: SambaTensor, dims: Tuple[int]) SambaTensor ¶
Returns a view of the original tensor
input
with its dimensions permuted.- Parameters:
input – the input tensor.
dims – the desired ordering of dimensions.
Supported data types:
input: torch.bfloat16, torch.float32
Example:
>>> x = samba.randn(2, 3, 5) >>> x.size() torch.Size([2, 3, 5]) >>> samba.permute(x, (2, 0, 1)).size() torch.Size([5, 2, 3])
See also
For more details see
torch.permute()
- reshape(input: SambaTensor, shape: Tuple[int]) SambaTensor ¶
Returns a tensor with the same data and number of elements as
input
, but with the specified shape. When possible, the returned tensor will be a view ofinput
. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.See
view
on when it is possible to return a view.A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in
input
.- Parameters:
input – the tensor to be reshaped
shape – the new shape
Example:
>>> a = samba.arange(4.) >>> samba.reshape(a, (2, 2)).data tensor([[ 0., 1.], [ 2., 3.]]) >>> b = samba.SambaTensor(torch.tensor([[0, 1], [2, 3]])) >>> samba.reshape(b, (-1,)).data tensor([ 0, 1, 2, 3])
See also
For more details
torch.reshape()
- scaled_dot_product_attention(query: SambaTensor, key: SambaTensor, value: SambaTensor, attn_mask: SambaTensor, dropout_p: float, is_causal: bool) SambaTensor ¶
New in version 1.18.
Computes scaled dot product attention (SDPA) using query, key, and value tensors, with optional attention masking and dropout. This function is designed for compatibility with both CPU and RDU environments. It supports
'math_sdp'
and'seg_softmax_sdp'
implementations.'math_sdp'
is the original algebraic version of the operator.'seg_softmax_sdp'
, or segmented softmax attention, is the version specialized for RDU computation, developed based on flash attention.- Parameters:
query – the query tensor.
key – the key tensor.
value – the value tensor.
attn_mask – optional attention mask. If provided, it is applied to the attention weights.
dropout_p – dropout probability for the attention weights. If 0.0, no dropout is applied.
is_causal – if
True
, applies causal masking to prevent the attention mechanism from peeking into the future. Cannot be used withattn_mask
.
Notes
'seg_softmax_sdp'
is a variant of SDPA optimized for RDU based on flash attention, offering enhanced speed and memory efficiency.'math_sdp'
is the algebraic implementation of SDPA, providing a hardware-neutral fallback for all PyTorch platforms.sambaflow automatically selects the best implementation based on the input, but users can override this using context managers. See the examples below.
- Context Directives:
'disable_segmented_softmax_sdp'
will force the operator to use'math_sdp'
.'sdp_mixed_p'
is for mixed precision to achieve higher accuracy without full fp32 precision.'sdp_sliding_window_size'
is a tuple of integers for sliding window when using'seg_softmax_sdp'
, (left, right). Token i will attend to tokens [i - left, i + right] inclusive. Negative value indicates unlimited attention window.'sdp_block_size'
controls the block size of the used in segmented softmax attention. Does not affect functionality.
- Restrictions:
attn_mask and is_causal cannot be set at the same time. If needed, please include the causal mask in the attn_mask.
attn_mask and sliding_window cannot be set at the same time. If needed, please include sliding window in the attn_mask.
Examples:
>>> # To force 'math_sdp' mode for debugging: >>> with samba.directives.sdpa_directives({'disable_segmented_softmax_sdp': True}): >>> result = F.scaled_dot_product_attention(query, ... key, ... value, ... attn_mask, ... dropout_p=dropout_p, ... is_causal=is_causal) >>> # To enable mixed precision for higher accuracy without full fp32 precision: >>> with samba.directives.sdpa_directives({'sdp_mixed_p': True}): >>> result = F.scaled_dot_product_attention(query, >>> key, >>> value, >>> attn_mask, >>> dropout_p=dropout_p, >>> is_causal=is_causal) >>> # To set sliding window size: >>> with samba.session._add_directives({'sdp_sliding_window_size': (4096, 0)}): >>> result = F.scaled_dot_product_attention(query, ... key, ... value, ... attn_mask, ... dropout_p=dropout_p, ... is_causal=is_causal) >>> # To set sdp block size: >>> with samba.session._add_directives({'sdp_mixed_p': True}): >>> result = F.scaled_dot_product_attention(query, ... key, ... value, ... attn_mask, ... dropout_p=dropout_p, ... is_causal=is_causal)
See also
For more details see
torch.nn.functional.scaled_dot_product_attention()
- sn_identity(ipt: SambaTensor) SambaTensor ¶
New in version 1.18.
Returns the input tensor as is, without any modifications. This function ensures that the input tensor is passed through without any alterations. It acts as a passthrough, similar to
torch.nn.Identity
.- Parameters:
ipt – the
SambaTensor
to pass through the function without modification.
Example
>>> from sambaflow.samba.functional import sn_identity >>> # In this example, the tensor x is created with the values [1, 2, 3]. >>> # sn_identity is applied to x, and as a passthrough function, it returns a tensor y identical to x. >>> x = samba.SambaTensor(torch.tensor([1, 2, 3])) >>> y = sn_identity(x) >>> y.data tensor([1, 2, 3])
See also
For more details see
torch.nn.Identity
.
- split(tensor: SambaTensor, split_size_or_sections: int | List[int], dim: int = 0) Tuple[SambaTensor, ...] ¶
Splits the tensor into chunks. Each chunk is a view of the original tensor.
If
split_size_or_sections
is an integer type, thentensor
will be split into equally sized chunks (if possible). Last chunk will be smaller if the tensor size along the given dimensiondim
is not divisible bysplit_size
.If
split_size_or_sections
is a list, thentensor
will be split intolen(split_size_or_sections)
chunks with sizes indim
according tosplit_size_or_sections
.- Parameters:
tensor – tensor to split.
split_size_or_sections – size of a single chunk or list of sizes for each chunk.
dim – dimension along which to split the tensor.
Supported data types:
tensor: torch.bfloat16, torch.float32
Example:
>>> a = samba.arange(10).reshape(5,2) >>> a.data tensor([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) >>> for output in samba.split(a, 2): output.data tensor([[0, 1], [2, 3]]) tensor([[4, 5], [6, 7]]) tensor([[8, 9]]) >>> for output in samba.split(a, [1,4]): output.data tensor([[0, 1]]) tensor([[2, 3], [4, 5], [6, 7], [8, 9]])
See also
For more details
torch.split()
- squeeze(input: SambaTensor, dim: int | None = None) SambaTensor ¶
Returns a tensor with all the dimensions of
input
of size 1 removed.For example, if input is of shape: \((A \times 1 \times B \times C \times 1 \times D)\) then the result will be of shape: \((A \times B \times C \times D)\).
When
dim
is given, a squeeze operation is done only in the given dimension. If input is of shape: \((A \times 1 \times B)\),squeeze(input, 0)
leaves the tensor unchanged, butsqueeze(input, 1)
will squeeze the tensor to the shape \((A \times B)\).Note
The returned tensor shares the storage with the input tensor, so changing the contents of one will change the contents of the other.
Warning
If the tensor has a batch dimension of size 1, then
squeeze(input)
will also remove the batch dimension, which can lead to unexpected errors.- Parameters:
input – the input tensor.
dim – if given, the input will be squeezed only in this dimension
Example:
>>> x = samba.zeros(2, 1, 2, 1, 2) >>> x.size() torch.Size([2, 1, 2, 1, 2]) >>> y = samba.squeeze(x) >>> y.size() torch.Size([2, 2, 2]) >>> y = samba.squeeze(x, 0) >>> y.size() torch.Size([2, 1, 2, 1, 2]) >>> y = samba.squeeze(x, 1) >>> y.size() torch.Size([2, 2, 1, 2])
See also
For more details
torch.squeeze()
- stack(tensors: List[SambaTensor] | Tuple[SambaTensor, ...], dim: int = 0, *, out: SambaTensor | None = None) SambaTensor ¶
Concatenates a sequence of tensors along a new dimension.
All tensors need to be of the same size.
- Parameters:
tensors – sequence of tensors to concatenate
dim – dimension to insert. Has to be between 0 and the number of dimensions of concatenated tensors (inclusive)
out – the output tensor. Defaults to
None
.
Supported data types:
tensors: torch.bfloat16
Note
The PyTorch API optional keyword arg
out
is not supported on RDU and will throw an error.See also
For more details
torch.stack()
- to(input: SambaTensor, *args, **kwargs) SambaTensor ¶
New in version 1.18.
Performs Tensor dtype and/or device conversion. A
torch.dtype
andtorch.device
are inferred from the arguments of self.to(*args, **kwargs).Note
If the
self
Tensor already has the correcttorch.dtype
andtorch.device
, thenself
is returned. Otherwise, the returned tensor is a copy ofself
with the desiredtorch.dtype
andtorch.device
.Here are the ways to call
to
:- to(dtype: torch.dtype) SambaTensor
Returns a SambaTensor with the specified
dtype
.
- to(other: SambaTensor) SambaTensor:
Returns a SambaTensor with the same
torch.dtype
andtorch.device
as the SambaTensorother
.
Example:
>>> tensor = samba.randn(2, 2) # Initially dtype=float32 >>> tensor.to(torch.float64).data tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]]) # dtype=torch.float64 >>> other = samba.randn((), dtype=torch.float64) >>> tensor.to(other).data tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]]) # dtype=torch.float64
See also
For more details
torch.Tensor.to()
.
- transpose(input: SambaTensor, dim0: int, dim1: int) SambaTensor ¶
Returns a tensor that is a transposed version of
input
. The given dimensionsdim0
anddim1
are swapped.- Parameters:
input – the input tensor.
dim0 – the first dimension to be transposed.
dim1 – the second dimension to be transposed.
Example:
>>> samba.set_seed(1) >>> x = samba.randn(2, 3) >>> x.data tensor([[ 0.6614, 0.2669, 0.0617], [ 0.6213, -0.4519, -0.1661]]) >>> samba.transpose(x, 0, 1).data tensor([[ 0.6614, 0.6213], [ 0.2669, -0.4519], [ 0.0617, -0.1661]])
See also
For more details
torch.transpose()
- type_as(input: SambaTensor, tensor: SambaTensor) SambaTensor ¶
New in version 1.18.
Returns this tensor cast to the type of the given tensor.
This is a no-op if the tensor is already of the correct type. This is equivalent to
self.type(tensor.type())
.- Parameters:
input – the input tensor.
tensor – the tensor which has the desired type
See also
For more details
torch.Tensor.type_as()
.
- unsqueeze(input: SambaTensor, dim: int) SambaTensor ¶
Returns a new tensor with a dimension of size one inserted at the specified position.
The returned tensor shares the same underlying data with this tensor.
A
dim
value within the range[-input.dim() - 1, input.dim() + 1)
can be used. Negativedim
will correspond tounsqueeze()
applied atdim
=dim + input.dim() + 1
.- Parameters:
input – the input tensor.
dim – the index at which to insert the singleton dimension
Supported data types:
input: torch.bfloat16, torch.float32
Example:
>>> x = samba.SambaTensor(torch.tensor([1, 2, 3, 4])) >>> samba.unsqueeze(x, 0).data tensor([[ 1, 2, 3, 4]]) >>> samba.unsqueeze(x, 1).data tensor([[ 1], [ 2], [ 3], [ 4]])
See also
For more details
torch.unsqueeze()
- view(input: SambaTensor, *shape: Tuple[int, ...]) SambaTensor ¶
Returns a new tensor with the same data as the
input
tensor but with a differentshape
.The returned tensor shares the same data and must have the same number of elements, but may have a different size. For a tensor to be viewed, the new view size must be compatible with its original size and stride, i.e., each new view dimension must either be a subspace of an original dimension, or only span across original dimensions \(d, d+1, \dots, d+k\) that satisfy the following contiguity-like condition that \(\forall i = d, \dots, d+k-1\),
\[\text{stride}[i] = \text{stride}[i+1] \times \text{size}[i+1]\]Otherwise, it will not be possible to view
self
tensor asshape
without copying it (e.g., viacontiguous()
). When it is unclear whether aview()
can be performed, it is advisable to usereshape()
, which returns a view if the shapes are compatible, and copies (equivalent to callingcontiguous()
) otherwise.Note
In PyTorch, view supports type casting as well if given a torch.dtype as input. This is not currently supported in SambaFlow.
- Parameters:
input – the input tensor
shape – the desired size
Supported data types:
input: torch.bfloat16, torch.float32, torch.int16, torch.int32, torch.int64
Example:
>>> x = samba.randn(4, 4) >>> x.size() torch.Size([4, 4]) >>> y = x.view(16) >>> y.size() torch.Size([16]) >>> z = x.view(-1, 8) # the size -1 is inferred from other dimensions >>> z.size() torch.Size([2, 8])
See also
For details, see
torch.Tensor.view()