SambaStack Models - SambaNova Documentation

SambaStack supports a variety of models that can be deployed to both on-premises and hosted environments. Contact your system administrator to determine which models are available on your deployment.

Deployment options

When deploying models in SambaStack, administrators can select from various context length and batch size combinations.

Smaller batch sizes provide higher token throughput (tokens/second).
Larger batch sizes provide better concurrency for multiple users.

Supported models

The table below lists supported models, context lengths, batch sizes, and features.

Developer/Model ID	Type	Context length (batch size)	Features and optimizations	View on Hugging Face
Meta
`Meta-Llama-3.3-70B-Instruct`	Text	View 4K (1,2,4,8,16,32) 8K (1,2,4,8) 16K (1,2,4) 32K (1,2,4) 64K (1) 128K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: Yes Optimizations: Speculative decoding	Model card
`Meta-Llama-3.1-8B-Instruct`	Text	View 4K (1,2,4,8) 8K (1,2,4,8) 16K (1,2,4)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: Yes Optimizations: None	Model card
`Meta-Llama-3.1-405B-Instruct`	Text	View 4K (1,2,4) 8K (1) 16K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	Model card
`Llama-4-Maverick-17B-128E-Instruct`	Image, Text	View 4K (1,4) 8K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
DeepSeek
`DeepSeek-R1-0528`	Reasoning, Text	View 4K (4) 8K (1) 16K (1) 32K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
`DeepSeek-R1-Distill-Llama-70B`	Reasoning, Text	View 4K (1,2,4,8,16,32) 8K (1,2,4,8) 16K (1,2,4) 32K (1,2,4) 64K (1) 128K (1)	View Endpoint: Chat completions Capabilities: None Import checkpoint: Yes Optimizations: Speculative decoding	Model card
`DeepSeek-V3-0324`	Text	View 4K (4) 8K (1) 16K (1) 32K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
`DeepSeek-V3.1`	Reasoning, Text	View 4K (4) 8K (1) 16K (1) 32K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
OpenAI
`gpt-oss-120b`	Text	View 8K (2) 32K (2) 64K (2) 128K (2)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
`Whisper-Large-v3`	Audio	View 4K (1,16,32)	View Endpoint: Translation, Transcription Capabilities: None Import checkpoint: No Optimizations: None	Model card
Qwen
`Qwen3-32B`	Reasoning, Text	View 8K (1)	View Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: None	Model card
Tokyotech-llm
`Llama-3.3-Swallow-70B-Instruct-v0.4`	Text	View 4K (1,2,4,8,16) 8K (1,2,4,8,16) 16K (1,2,4) 32K (1,2,4) 64K (1) 128K (1)	View Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: Speculative decoding	Model card
Other
`E5-Mistral-7B-Instruct`	Embedding	View 4K (1,2,4,8,16,32)	View Endpoint: Embeddings Capabilities: None Import checkpoint: No Optimizations: None	Model card

Sample bundles

In SambaStack, models are deployed not individually but as bundles. A bundle is a packaged deployment that combines one or more models with their configurations, such as batch sizes, sequence lengths, and precision settings. For example, deploying the Meta-Llama-3.3-70B model with a batch size of 4 and a sequence length of 16k represents one configuration. A bundle, however, may contain multiple configurations like this across the same or different models. SambaNova’s RDU technology allows multiple models and configurations to be loaded together in a single deployment. This enables instant switching between models and batch/sequence profiles as needed. Unlike traditional GPU systems where deployments are often single-model and static, SambaStack supports multi-model, multi-configuration bundles. This approach delivers greater efficiency, flexibility, and throughput while maintaining low latency.

Bundle template	Bundle description	Bundle configuration
70b-3dot3-ss-16k-32k-64k-128k	Speculative decoding of: `Meta-Llama-3.3-70B` (Target) `Meta-Llama-3.2-1B` (Draft) Medium to large context length with low batch size	View Target Models: `Meta-Llama-3.3-70B-Instruct` Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 2 Seq Length: 64K, BS: 1 Seq Length: 128K, BS: 1 Draft Models: `Meta-Llama-3.2-1B-Instruct` Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 2 Seq Length: 64K, BS: 1 Seq Length: 128K, BS: 1
70b-3dot3-ss-8-16-32k-batching	Speculative decoding of: `Meta-Llama-3.3-70B` (Target) `Meta-Llama-3.2-1B` (Draft) Small to medium context length with low-medium batch sizes	View Target Models: `Meta-Llama-3.3-70B-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 2 Draft Models: `Meta-Llama-3.2-1B-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 2
70b-ss-8-16-32k	Speculative decoding of: `Meta-Llama-3.3-70B` (Target) `DeepSeek-R1-Distill-Llama-70B` (Target) `Meta-Llama-3.2-1B` (Draft) `Meta-Llama-3.2-1B-Distill-Instruct` (Draft) Small to medium context length with low-medium batch sizes	View Target Models: `DeepSeek-R1-Distill-Llama-70B` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 4 `Meta-Llama-3.3-70B-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 4 Draft Models: `Meta-Llama-3.2-1B-Distill-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 4 `Meta-Llama-3.2-1B-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 4
llama-405b-s-m	Speculative decoding of: `Meta-Llama-3.1-405B` (Target) `Meta-Llama-3.1-1B` (Draft) `Meta-Llama-3.2-3B` (Draft) Small context length with low batch sizes	View Target Models: `Meta-Llama-3.1-405B-Instruct` (Target) Seq Length: 16K, BS: 1 Seq Length: 8K, BS: 1 Seq Length: 4K, BS: 1, 2, 4 Draft Models: `Meta-Llama-3.1-8B-Instruct-16k` (Draft) Seq Length: 16K, BS: 1 `Meta-Llama-3.2-3B-Instruct` (Draft) Seq Length: 8K, BS: 1 Seq Length: 4K, BS: 1, 2, 4
deepseek-r1-v3-fp8-32k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3-0324` High context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 32K, BS: 1 `DeepSeek-V3-0324` Seq Length: 32K, BS: 1
deepseek-r1-v3-fp8-16k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3-0324` Medium context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 16K, BS: 1 `DeepSeek-V3-0324` Seq Length: 16K, BS: 1
deepseek-r1-v3-fp8-4-8k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3-0324` Low context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 8K, BS: 1 Seq Length: 4K, BS: 4 `DeepSeek-V3-0324` Seq Length: 8K, BS: 1 Seq Length: 4K, BS: 4

deepseek-r1-v31-fp8-16k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3.1` Medium context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 16K, BS: 1 `DeepSeek-V3.1` Seq Length: 16K, BS: 1
deepseek-r1-v31-fp8-32k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3.1` Large context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 32K, BS: 1 `DeepSeek-V3.1` Seq Length: 32K, BS: 1

deepseek-r1-v31-fp8-4k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3.1` Small context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 4K, BS: 1, 4 `DeepSeek-V3.1` Seq Length: 4K, BS: 1, 4

deepseek-r1-v31-fp8-8k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3.1` Small context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 8K, BS: 1 `DeepSeek-V3.1` Seq Length: 8K, BS: 1
llama-4-medium-8-16-32-64-128k	`Llama-4-Maverick-17B-128E-Instruct` Small to large context length with low batch	View `Llama-4-Maverick-17B-128E-Instruct` Seq Length: 8K, BS: 1 Seq Length: 16K, BS: 1 Seq Length: 32K, BS: 1 Seq Length: 64K, BS: 1 Seq Length: 128K, BS: 1
qwen3-32b-whisper-e5-mistral	Combination of: `Qwen-3-32B` `Whisper-Large-v3` `E5-Mistral-7B-Instruct` Small to medium context length with varied batch size	View `E5-Mistral-7B-Instruct` Seq Length: 4K, BS: 1, 4, 8, 16, 32 `Qwen-3-32B` Seq Length: 8K, BS: 1, 4 Seq Length: 16K, BS: 1 Seq Length: 32K, BS: 1, 2 `Whisper-Large-v3` BS: 1, 16, 32
gpt-oss-120b-8k	`gpt-oss-120b` Small context length with low batch size	View `gpt-oss-120b` Seq Length: 8K, BS: 2
gpt-oss-120b-32k	`gpt-oss-120b` Medium context length with low batch size	View `gpt-oss-120b` Seq Length: 32K, BS: 2
gpt-oss-120b-64-128k	`gpt-oss-120b` Large context length with low batch size	View `gpt-oss-120b` Seq Length: 64K, BS: 2 Seq Length: 128K, BS: 2

Recommended bundles

The following table lists recommended bundle templates for the available models in SambaStack. Each entry pairs a specific model with its corresponding deployment bundle, enabling efficient configuration and usage of these models within SambaStack environments.

Model name	Bundle template
`Meta-Llama-3.3-70B-Instruct`	70b-3dot3-ss-8-16-32k-batching
`Llama-4-Maverick-17B-128E-Instruct`	llama-4-medium-8-16-32-64-128k
`DeepSeek-R1-0528`	deepseek-r1-v31-fp8-16k
`DeepSeek-R1-Distill-Llama-70B`	70b-ss-8-16-32k
`DeepSeek-V3-0324`	deepseek-r1-v3-fp8-16k
`DeepSeek-V3.1`	deepseek-r1-v31-fp8-16k
`Whisper-Large-v3`	qwen3-32b-whisper-e5-mistral
`Qwen3-32B`	qwen3-32b-whisper-e5-mistral
`E5-Mistral-7B-Instruct`	qwen3-32b-whisper-e5-mistral

Get started

Models

Features

Build

Resources

SambaStack Models Overview and Specifications

Deployment options

Supported models

Sample bundles

Recommended bundles

Get started

Models

Features

Build

Resources

​Deployment options

​Supported models

​Sample bundles

​Recommended bundles

Deployment options

Supported models

Sample bundles

Recommended bundles