Skip to main content
SambaStack supports a variety of models that can be deployed to both on-premises and hosted environments. Contact your system administrator to determine which models are available on your deployment.

Deployment options

When deploying models in SambaStack, administrators can select from various context length and batch size combinations.
  • Smaller batch sizes provide higher token throughput (tokens/second).
  • Larger batch sizes provide better concurrency for multiple users.

Supported models

The table below lists supported models, context lengths, batch sizes, and features.
Developer/Model IDTypeContext length (batch size)Features and optimizationsView on Hugging Face
Meta
Meta-Llama-3.3-70B-InstructText
View
  • 4K (1,2,4,8,16,32)
  • 8K (1,2,4,8)
  • 16K (1,2,4)
  • 32K (1,2,4)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: Yes
  • Optimizations: Speculative decoding
Model card
Meta-Llama-3.1-8B-InstructText
View
  • 4K (1,2,4,8)
  • 8K (1,2,4,8)
  • 16K (1,2,4)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: Yes
  • Optimizations: None
Model card
Meta-Llama-3.1-405B-InstructText
View
  • 4K (1,2,4)
  • 8K (1)
  • 16K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: Speculative decoding
Model card
Llama-4-Maverick-17B-128E-InstructImage, Text
View
  • 4K (1,4)
  • 8K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: None
Model card
DeepSeek
DeepSeek-R1-0528Reasoning, Text
View
  • 4K (4)
  • 8K (1)
  • 16K (1)
  • 32K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: None
Model card
DeepSeek-R1-Distill-Llama-70BReasoning, Text
View
  • 4K (1,2,4,8,16,32)
  • 8K (1,2,4,8)
  • 16K (1,2,4)
  • 32K (1,2,4)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: None
  • Import checkpoint: Yes
  • Optimizations: Speculative decoding
Model card
DeepSeek-V3-0324Text
View
  • 4K (4)
  • 8K (1)
  • 16K (1)
  • 32K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: None
Model card
DeepSeek-V3.1Reasoning, Text
View
  • 4K (4)
  • 8K (1)
  • 16K (1)
  • 32K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: None
Model card
OpenAI
gpt-oss-120bText
View
  • 8K (2)
  • 32K (2)
  • 64K (2)
  • 128K (2)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: None
Model card
Whisper-Large-v3Audio
View
  • 4K (1,16,32)
View
  • Endpoint: Translation, Transcription
  • Capabilities: None
  • Import checkpoint: No
  • Optimizations: None
Model card
Qwen
Qwen3-32BReasoning, Text
View
  • 8K (1)
View
  • Endpoint: Chat completions
  • Capabilities: None
  • Import checkpoint: No
  • Optimizations: None
Model card
Tokyotech-llm
Llama-3.3-Swallow-70B-Instruct-v0.4Text
View
  • 4K (1,2,4,8,16)
  • 8K (1,2,4,8,16)
  • 16K (1,2,4)
  • 32K (1,2,4)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: None
  • Import checkpoint: No
  • Optimizations: Speculative decoding
Model card
Other
E5-Mistral-7B-InstructEmbedding
View
  • 4K (1,2,4,8,16,32)
View
  • Endpoint: Embeddings
  • Capabilities: None
  • Import checkpoint: No
  • Optimizations: None
Model card

Sample bundles

In SambaStack, models are deployed not individually but as bundles. A bundle is a packaged deployment that combines one or more models with their configurations, such as batch sizes, sequence lengths, and precision settings. For example, deploying the Meta-Llama-3.3-70B model with a batch size of 4 and a sequence length of 16k represents one configuration. A bundle, however, may contain multiple configurations like this across the same or different models. SambaNova’s RDU technology allows multiple models and configurations to be loaded together in a single deployment. This enables instant switching between models and batch/sequence profiles as needed. Unlike traditional GPU systems where deployments are often single-model and static, SambaStack supports multi-model, multi-configuration bundles. This approach delivers greater efficiency, flexibility, and throughput while maintaining low latency.
Bundle templateBundle descriptionBundle configuration
70b-3dot3-ss-16k-32k-64k-128k
  • Speculative decoding of:
    • Meta-Llama-3.3-70B (Target)
    • Meta-Llama-3.2-1B (Draft)
  • Medium to large context length with low batch size
View

Target Models:

  • Meta-Llama-3.3-70B-Instruct
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 2
    • Seq Length: 64K, BS: 1
    • Seq Length: 128K, BS: 1

Draft Models:

  • Meta-Llama-3.2-1B-Instruct
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 2
    • Seq Length: 64K, BS: 1
    • Seq Length: 128K, BS: 1
70b-3dot3-ss-8-16-32k-batching
  • Speculative decoding of:
    • Meta-Llama-3.3-70B (Target)
    • Meta-Llama-3.2-1B (Draft)
  • Small to medium context length with low-medium batch sizes
View

Target Models:

  • Meta-Llama-3.3-70B-Instruct
    • Seq Length: 8K, BS: 1, 2, 4, 8
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 2

Draft Models:

  • Meta-Llama-3.2-1B-Instruct
    • Seq Length: 8K, BS: 1, 2, 4, 8
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 2
70b-ss-8-16-32k
  • Speculative decoding of:
    • Meta-Llama-3.3-70B (Target)
    • DeepSeek-R1-Distill-Llama-70B (Target)
    • Meta-Llama-3.2-1B (Draft)
    • Meta-Llama-3.2-1B-Distill-Instruct (Draft)
  • Small to medium context length with low-medium batch sizes
View

Target Models:

  • DeepSeek-R1-Distill-Llama-70B
    • Seq Length: 8K, BS: 1, 2, 4, 8
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 4
  • Meta-Llama-3.3-70B-Instruct
    • Seq Length: 8K, BS: 1, 2, 4, 8
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 4

Draft Models:

  • Meta-Llama-3.2-1B-Distill-Instruct
    • Seq Length: 8K, BS: 1, 2, 4, 8
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 4
  • Meta-Llama-3.2-1B-Instruct
    • Seq Length: 8K, BS: 1, 2, 4, 8
    • Seq Length: 16K, BS: 1, 2, 4
    • Seq Length: 32K, BS: 1, 4
llama-405b-s-m
  • Speculative decoding of:
    • Meta-Llama-3.1-405B (Target)
    • Meta-Llama-3.1-1B (Draft)
    • Meta-Llama-3.2-3B (Draft)
  • Small context length with low batch sizes
View

Target Models:

  • Meta-Llama-3.1-405B-Instruct (Target)
    • Seq Length: 16K, BS: 1
    • Seq Length: 8K, BS: 1
    • Seq Length: 4K, BS: 1, 2, 4

Draft Models:

  • Meta-Llama-3.1-8B-Instruct-16k (Draft)
    • Seq Length: 16K, BS: 1
  • Meta-Llama-3.2-3B-Instruct (Draft)
    • Seq Length: 8K, BS: 1
    • Seq Length: 4K, BS: 1, 2, 4
deepseek-r1-v3-fp8-32k
  • Combination of:
    • DeepSeek-R1-0528
    • DeepSeek-V3-0324
  • High context length with low batch size
View

Models:

  • DeepSeek-R1-0528
    • Seq Length: 32K, BS: 1
  • DeepSeek-V3-0324
    • Seq Length: 32K, BS: 1
deepseek-r1-v3-fp8-16k
  • Combination of:
    • DeepSeek-R1-0528
    • DeepSeek-V3-0324
  • Medium context length with low batch size
View

Models:

  • DeepSeek-R1-0528
    • Seq Length: 16K, BS: 1
  • DeepSeek-V3-0324
    • Seq Length: 16K, BS: 1
deepseek-r1-v3-fp8-4-8k
  • Combination of:
    • DeepSeek-R1-0528
    • DeepSeek-V3-0324
  • Low context length with low batch size
View

Models:

  • DeepSeek-R1-0528
    • Seq Length: 8K, BS: 1
    • Seq Length: 4K, BS: 4
  • DeepSeek-V3-0324
    • Seq Length: 8K, BS: 1
    • Seq Length: 4K, BS: 4
deepseek-r1-v31-fp8-16k
  • Combination of:
    • DeepSeek-R1-0528
    • DeepSeek-V3.1
  • Medium context length with low batch size
View

Models:

  • DeepSeek-R1-0528
    • Seq Length: 16K, BS: 1
  • DeepSeek-V3.1
    • Seq Length: 16K, BS: 1
deepseek-r1-v31-fp8-32k
  • Combination of:
    • DeepSeek-R1-0528
    • DeepSeek-V3.1
  • Large context length with low batch size
View

Models:

  • DeepSeek-R1-0528
    • Seq Length: 32K, BS: 1
  • DeepSeek-V3.1
    • Seq Length: 32K, BS: 1
deepseek-r1-v31-fp8-4k
  • Combination of:
    • DeepSeek-R1-0528
    • DeepSeek-V3.1
  • Small context length with low batch size
View

Models:

  • DeepSeek-R1-0528
    • Seq Length: 4K, BS: 1, 4
  • DeepSeek-V3.1
    • Seq Length: 4K, BS: 1, 4
deepseek-r1-v31-fp8-8k
  • Combination of:
    • DeepSeek-R1-0528
    • DeepSeek-V3.1
  • Small context length with low batch size
View

Models:

  • DeepSeek-R1-0528
    • Seq Length: 8K, BS: 1
  • DeepSeek-V3.1
    • Seq Length: 8K, BS: 1
llama-4-medium-8-16-32-64-128k
  • Llama-4-Maverick-17B-128E-Instruct
  • Small to large context length with low batch
View
  • Llama-4-Maverick-17B-128E-Instruct
    • Seq Length: 8K, BS: 1
    • Seq Length: 16K, BS: 1
    • Seq Length: 32K, BS: 1
    • Seq Length: 64K, BS: 1
    • Seq Length: 128K, BS: 1
qwen3-32b-whisper-e5-mistral
  • Combination of:
    • Qwen-3-32B
    • Whisper-Large-v3
    • E5-Mistral-7B-Instruct
  • Small to medium context length with varied batch size
View
  • E5-Mistral-7B-Instruct
    • Seq Length: 4K, BS: 1, 4, 8, 16, 32
  • Qwen-3-32B
    • Seq Length: 8K, BS: 1, 4
    • Seq Length: 16K, BS: 1
    • Seq Length: 32K, BS: 1, 2
  • Whisper-Large-v3
    • BS: 1, 16, 32
gpt-oss-120b-8k
  • gpt-oss-120b
  • Small context length with low batch size
View
  • gpt-oss-120b
    • Seq Length: 8K, BS: 2
gpt-oss-120b-32k
  • gpt-oss-120b
  • Medium context length with low batch size
View
  • gpt-oss-120b
    • Seq Length: 32K, BS: 2
gpt-oss-120b-64-128k
  • gpt-oss-120b
  • Large context length with low batch size
View
  • gpt-oss-120b
    • Seq Length: 64K, BS: 2
    • Seq Length: 128K, BS: 2
The following table lists recommended bundle templates for the available models in SambaStack. Each entry pairs a specific model with its corresponding deployment bundle, enabling efficient configuration and usage of these models within SambaStack environments.
Model nameBundle template
Meta-Llama-3.3-70B-Instruct70b-3dot3-ss-8-16-32k-batching
Llama-4-Maverick-17B-128E-Instructllama-4-medium-8-16-32-64-128k
DeepSeek-R1-0528deepseek-r1-v31-fp8-16k
DeepSeek-R1-Distill-Llama-70B70b-ss-8-16-32k
DeepSeek-V3-0324deepseek-r1-v3-fp8-16k
DeepSeek-V3.1deepseek-r1-v31-fp8-16k
Whisper-Large-v3qwen3-32b-whisper-e5-mistral
Qwen3-32Bqwen3-32b-whisper-e5-mistral
E5-Mistral-7B-Instructqwen3-32b-whisper-e5-mistral
I