Skip to main content
SambaStack supports a variety of models that can be deployed to both on-prem and hosted environments. Contact your system administrator to determine which models are available on your deployment.

Deployment options

When deploying models in SambaStack, administrators can select from various context length and batch size combinations.
  • Smaller batch sizes provide higher token throughput (tokens/second).
  • Larger batch sizes provide better concurrency for multiple users.

Supported models

You can run the following command to discover available models in your cluster:
kubectl -n <namespace> get models
The table below lists supported models, context lengths, batch sizes, and features.
Developer/Model IDTypeSuggested UseContext length (batch size)Features and optimizationsView on Hugging Face
Meta
Meta-Llama-3.3-70B-InstructText
  • Task agent
  • Tool-calling agent
  • Text to SQL/Cipher
View
  • 4K (1, 2, 4, 8, 16, 32)
  • 8K (1, 2, 4, 8)
  • 16K (1, 2, 4)
  • 32K (1, 2, 4)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Custom checkpoints supported: Yes
  • Optimizations: Speculative decoding
Model card
Meta-Llama-3.1-8B-InstructText
  • Gateway agent
  • Validation agent
View
  • 4K (1, 2, 4, 8, 16, 32, 64, 128)
  • 8K (1, 2, 4, 8, 16, 32, 64)
  • 16K (1, 2, 4, 8)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Custom checkpoints supported: Yes
  • Optimizations: None
Model card
Meta-Llama-3.1-405B-InstructText
  • Task agent
  • Tool-calling agent
  • Code generation
View
  • 4K (1, 2, 4)
  • 8K (1)
  • 16K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Custom checkpoints supported: No
  • Optimizations: Speculative decoding
Model card
Llama-4-Maverick-17B-128E-InstructImage, Text
  • Image understanding
  • Task agent
  • Tool-calling agent
View
  • 8K (1)
  • 16K (1)
  • 32K (1)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Custom checkpoints supported: No
  • Optimizations: None
Model card
MiniMax
MiniMax-M2.5Text
  • Coding agent
View
  • 4K-32K (1, 2, 4, 6, 8)
  • 160K (1, 2)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, Structured output
  • Custom checkpoints supported: No
  • Optimizations: None
Model card
DeepSeek
DeepSeek-R1-0528Reasoning, Text
  • Complex reasoning
View
  • 4K (4)
  • 8K (1)
  • 16K (1)
  • 32K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Custom checkpoints supported: No
  • Optimizations: None
Model card
DeepSeek-R1-Distill-Llama-70BReasoning, Text
  • Complex reasoning
View
  • 4K (1, 2, 4, 8, 16, 32)
  • 8K (1, 2, 4, 8)
  • 16K (1, 2, 4)
  • 32K (1, 2, 4)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: None
  • Custom checkpoints supported: Yes
  • Optimizations: Speculative decoding
Model card
DeepSeek-V3-0324Text
  • Main/planner agent
  • Tool-calling agent
View
  • 4K (4)
  • 8K (1)
  • 16K (1)
  • 32K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Custom checkpoints supported: No
  • Optimizations: None
Model card
DeepSeek-V3.1Reasoning, Text
  • Main/planner agent
  • Tool-calling agent
View
  • 4K (4)
  • 8K (1)
  • 16K (1)
  • 32K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Custom checkpoints supported: No
  • Optimizations: None
Model card
OpenAI
gpt-oss-120bText
  • Main/planner agent
  • Tool-calling agent
View
  • 4K-32K (1, 2, 4, 6, 8)
  • 64K (1, 2, 4)
  • 128K (1, 2)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Custom checkpoints supported: No
  • Optimizations: None
Model card
Whisper-Large-v3Audio
  • Automatic speech recognition (ASR)
  • Audio transcription
View
  • 4K (1,16,32)
View
  • Endpoint: Translation, Transcription
  • Capabilities: None
  • Custom checkpoints supported: No
  • Optimizations: None
Model card
Qwen
Qwen3-32BReasoning, Text
  • Task agent
  • Multilingual instruction following
View
  • 8K (1)
View
  • Endpoint: Chat completions
  • Capabilities: None
  • Custom checkpoints supported: No
  • Optimizations: None
Model card
Tokyotech-llm
Llama-3.3-Swallow-70B-Instruct-v0.4Text
  • Japanese instruction following
  • Task agent
View
  • 4K (1, 2, 4, 8, 16)
  • 8K (1, 2, 4, 8, 16)
  • 16K (1, 2, 4)
  • 32K (1, 2, 4)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: None
  • Custom checkpoints supported: No
  • Optimizations: Speculative decoding
Model card
Other
E5-Mistral-7B-InstructEmbedding
  • Vector storage and retrieval (RAG)
View
  • 4K (1, 2, 4, 8, 16, 32)
View
  • Endpoint: Embeddings
  • Capabilities: None
  • Custom checkpoints supported: No
  • Optimizations: None
Model card
In SambaStack, models are not deployed individually; they are deployed as bundles. A bundle is a packaged deployment that groups one or more models together with their associated configurations—such as batch size, sequence length, and precision settings. For example, deploying the Meta‑Llama‑3.3‑70B model with a batch size of 4 and a sequence length of 16K tokens constitutes a single configuration. A bundle, however, can contain multiple such configurations, either for the same model or for different models. SambaNova’s RDU technology enables several models and configurations to be loaded simultaneously in a single deployment. This allows you to switch instantly between models and between batch‑/sequence‑size profiles as needed. In contrast to traditional GPU systems—where deployments are typically single‑model and static—SambaStack supports multi‑model, multi‑configuration bundles. This approach delivers higher efficiency, greater flexibility, and increased throughput while preserving low latency. You can run the following command to discover available bundles in your cluster:
kubectl -n <namespace> get bundles
The table below lists the recommended bundle templates for the models currently available in SambaStack. Each entry pairs a specific model with its corresponding deployment bundle, facilitating efficient configuration and use within SambaStack environments.
If the bundles listed below do not satisfy your inference requirements, you can create custom bundles that combine any mix of models and configurations so long as they fit in DDR memory.
Model nameBundle templateBundle descriptionBundle configuration
MiniMax-M2.5
  • dyt-minimax-m2p5-32k
  • dyt-minimax-m2p5-32-160k
  • Homogeneous bundles containing MiniMax-M2.5 configurations.
  • dyt-minimax-m2p5-32k is better for medium sequence lengths and high batching.
  • dyt-minimax-m2p5-32-160k is better for higher sequence lengths and low batching
View
  • dyt-minimax-m2p5-32k
    • Seq Length: 4K-32K, BS: 1, 2, 4, 6, 8
  • dyt-minimax-m2p5-32-160k
    • Seq Length: 32K, BS: 2
    • Seq Length: 160K, BS: 2
Meta-Llama-3.3-70B-Instruct70b-3dot3-ss-4-8-16-32-64-128k
  • Speculative decoding of:
    • Meta-Llama-3.3-70B (Target)
    • Meta-Llama-3.2-1B (Draft)
  • Medium to large context length with low batch size
View

Target Models:

  • Meta-Llama-3.3-70B-Instruct
    • Seq Length: 4K, BS: 2, 4, 8, 16, 32
    • Seq Length: 8K, BS: 2, 4, 8, 16, 32
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 2
    • Seq Length: 64K, BS: 1, 2, 4
    • Seq Length: 128K, BS: 1

Draft Models:

  • Meta-Llama-3.2-1B-Instruct
    • Seq Length: 4K, BS: 2, 4, 8, 16, 32
    • Seq Length: 8K, BS: 2, 4, 8, 16, 32
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 2
    • Seq Length: 64K, BS: 1, 2, 4
    • Seq Length: 128K, BS: 1
gpt-oss-120bdyt-gpt-oss-120b-32-64-128k
  • Homogeneous bundle containing gpt-oss-120b configurations.
  • Large context length with low batch size
View
  • gpt-oss-120b
    • Seq Length: 4K-32K, BS: 1, 2, 4, 6, 8
    • Seq Length: 64K, BS: 1, 2, 4
    • Seq Length: 128K, BS: 2
DeepSeek-R1-0528 /
DeepSeek-V3.1
deepseek-r1-v31-fp8-16k
  • Combination of:
    • DeepSeek-R1-0528
    • DeepSeek-V3.1
  • Medium context length with low batch size
View

Models:

  • DeepSeek-R1-0528
    • Seq Length: 16K, BS: 1
  • DeepSeek-V3.1
    • Seq Length: 16K, BS: 1
DeepSeek-V3-0324deepseek-r1-v3-fp8-16k
  • Combination of:
    • DeepSeek-R1-0528
    • DeepSeek-V3-0324
  • Medium context length with low batch size
View

Models:

  • DeepSeek-R1-0528
    • Seq Length: 16K, BS: 1
  • DeepSeek-V3-0324
    • Seq Length: 16K, BS: 1
Llama-4-Maverick-17B-128E-Instructllama-4-medium-8-16-32-64-128k
  • Homogeneous bundles containing Llama-4-Maverick-17B-128E-Instruct configurations.
  • Small to large context length with low batch
View
  • Llama-4-Maverick-17B-128E-Instruct
    • Seq Length: 8K, BS: 1
    • Seq Length: 16K, BS: 1
    • Seq Length: 32K, BS: 1
    • Seq Length: 64K, BS: 1
    • Seq Length: 128K, BS: 1
Whisper-Large-v3 /
Qwen3-32B
qwen3-32b-whisper-e5-mistral
  • Combination of:
    • Qwen3-32B
    • Whisper-Large-v3
    • E5-Mistral-7B-Instruct
  • Small to medium context length with varied batch size
View
  • E5-Mistral-7B-Instruct
    • Seq Length: 4K, BS: 1, 4, 8, 16, 32
  • Qwen3-32B
    • Seq Length: 8K, BS: 1, 4
    • Seq Length: 16K, BS: 1
    • Seq Length: 32K, BS: 1, 2
  • Whisper-Large-v3
    • BS: 1, 16, 32
E5-Mistral-7B-Instruct /
Meta-Llama-3.1-8B-Instruct
us-agentic-rag-1-1
  • Combination of:
    • gpt-oss-120b
    • Llama-4-Maverick-17B-128E-Instruct
    • Meta-Llama-3.1-8B-Instruct
    • Meta-Llama-3.3-70B (Target)
    • Meta-Llama-3.2-1B (Draft)
    • E5-Mistral-7B-Instruct
  • Small to medium context length with varied batch size
  • Speculative decoding supported for Meta-Llama-3.3-70B
View
  • gpt-oss-120b
    • Seq Length: 32K, BS: 4
    • Seq Length: 64K, BS: 2
    • Seq Length: 128K, BS: 2
  • Llama-4-Maverick-17B-128E-Instruct
    • Seq Length: 8K, BS: 1
    • Seq Length: 16K, BS: 1
  • Meta-Llama-3.3-70B (Target)/ Meta-Llama-3.2-1B (Draft)
    • Seq Length: 4K, BS: 1, 4, 8, 16, 32
    • Seq Length: 8K, BS: 1, 4, 8
    • Seq Length: 16K, BS: 1, 4
    • Seq Length: 32K, BS: 1, 4
    • Seq Length: 64K, BS: 1
    • Seq Length: 128K, BS: 1
  • Meta-Llama-3.1-8B-Instruct
    • Seq Length: 4K, BS: 1, 4, 16, 32
    • Seq Length: 8K, BS: 1, 4, 16, 32
    • Seq Length: 16K, BS: 1, 4, 8
  • E5-Mistral-7B-Instruct
    • Seq Length: 4K, BS: 1, 4, 8, 16, 32