Deployment options
When deploying models in SambaStack, administrators can select from various context length and batch size combinations.- Smaller batch sizes provide higher token throughput (tokens/second).
- Larger batch sizes provide better concurrency for multiple users.
Supported models
The table below lists supported models, context lengths, batch sizes, and features.Developer/Model ID | Type | Context length (batch size) | Features and optimizations | View on Hugging Face |
---|---|---|---|---|
Meta | ||||
Meta-Llama-3.3-70B-Instruct | Text | View
| View
| Model card |
Meta-Llama-3.1-8B-Instruct | Text | View
| View
| Model card |
Meta-Llama-3.1-405B-Instruct | Text | View
| View
| Model card |
Llama-4-Maverick-17B-128E-Instruct | Image, Text | View
| View
| Model card |
DeepSeek | ||||
DeepSeek-R1-0528 | Reasoning, Text | View
| View
| Model card |
DeepSeek-R1-Distill-Llama-70B | Reasoning, Text | View
| View
| Model card |
DeepSeek-V3-0324 | Text | View
| View
| Model card |
DeepSeek-V3.1 | Reasoning, Text | View
| View
| Model card |
OpenAI | ||||
gpt-oss-120b | Text | View
| View
| Model card |
Whisper-Large-v3 | Audio | View
| View
| Model card |
Qwen | ||||
Qwen3-32B | Reasoning, Text | View
| View
| Model card |
Tokyotech-llm | ||||
Llama-3.3-Swallow-70B-Instruct-v0.4 | Text | View
| View
| Model card |
Other | ||||
E5-Mistral-7B-Instruct | Embedding | View
| View
| Model card |
Sample bundles
In SambaStack, models are deployed not individually but as bundles. A bundle is a packaged deployment that combines one or more models with their configurations, such as batch sizes, sequence lengths, and precision settings. For example, deploying theMeta-Llama-3.3-70B
model with a batch size of 4 and a sequence length of 16k represents one configuration. A bundle, however, may contain multiple configurations like this across the same or different models.
SambaNova’s RDU technology allows multiple models and configurations to be loaded together in a single deployment. This enables instant switching between models and batch/sequence profiles as needed. Unlike traditional GPU systems where deployments are often single-model and static, SambaStack supports multi-model, multi-configuration bundles. This approach delivers greater efficiency, flexibility, and throughput while maintaining low latency.
Bundle template | Bundle description | Bundle configuration |
---|---|---|
70b-3dot3-ss-16k-32k-64k-128k |
| ViewTarget Models:
Draft Models:
|
70b-3dot3-ss-8-16-32k-batching |
| ViewTarget Models:
Draft Models:
|
70b-ss-8-16-32k |
| ViewTarget Models:
Draft Models:
|
llama-405b-s-m |
| ViewTarget Models:
Draft Models:
|
deepseek-r1-v3-fp8-32k |
| ViewModels:
|
deepseek-r1-v3-fp8-16k |
| ViewModels:
|
deepseek-r1-v3-fp8-4-8k |
| ViewModels:
|
deepseek-r1-v31-fp8-16k |
| ViewModels:
|
deepseek-r1-v31-fp8-32k |
| ViewModels:
|
deepseek-r1-v31-fp8-4k |
| ViewModels:
|
deepseek-r1-v31-fp8-8k |
| ViewModels:
|
llama-4-medium-8-16-32-64-128k |
| View
|
qwen3-32b-whisper-e5-mistral |
| View
|
gpt-oss-120b-8k |
| View
|
gpt-oss-120b-32k |
| View
|
gpt-oss-120b-64-128k |
| View
|
Recommended bundles
The following table lists recommended bundle templates for the available models in SambaStack. Each entry pairs a specific model with its corresponding deployment bundle, enabling efficient configuration and usage of these models within SambaStack environments.Model name | Bundle template |
---|---|
Meta-Llama-3.3-70B-Instruct | 70b-3dot3-ss-8-16-32k-batching |
Llama-4-Maverick-17B-128E-Instruct | llama-4-medium-8-16-32-64-128k |
DeepSeek-R1-0528 | deepseek-r1-v31-fp8-16k |
DeepSeek-R1-Distill-Llama-70B | 70b-ss-8-16-32k |
DeepSeek-V3-0324 | deepseek-r1-v3-fp8-16k |
DeepSeek-V3.1 | deepseek-r1-v31-fp8-16k |
Whisper-Large-v3 | qwen3-32b-whisper-e5-mistral |
Qwen3-32B | qwen3-32b-whisper-e5-mistral |
E5-Mistral-7B-Instruct | qwen3-32b-whisper-e5-mistral |