SambaStack supports a variety of models that can be deployed to both on-premises and hosted environments. Contact your system administrator to determine which models are available on your deployment. You can also use the Model list API command to view which models are deployed and available for your use.

Deployment options

When deploying models in SambaStack, administrators can select from various context length and batch size combinations.
  • Smaller batch sizes provide higher token throughput (tokens/second).
  • Larger batch sizes provide better concurrency for multiple users.

Supported models

The table below lists supported models, context lengths, batch sizes, and features.
Developer/Model IDTypeContext length (batch size)Features and optimizationsView on Hugging Face
Meta
Meta-Llama-3.3-70B-InstructText
View
  • 4K (1,2,4,8,16,32)
  • 8K (1,2,4,8)
  • 16K (1,2,4)
  • 32K (1,2,4)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: Yes
  • Optimizations: Speculative decoding
Model card
Meta-Llama-3.1-8B-InstructText
View
  • 4K (1,2,4,8)
  • 8K (1,2,4,8)
  • 16K (1,2,4)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: Yes
  • Optimizations: None
Model card
Llama-4-Maverick-17B-128E-InstructImage, Text
View
  • 4K (1,4)
  • 8K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: None
Model card
DeepSeek
DeepSeek-R1-0528Reasoning, Text
View
  • 4K (4)
  • 8K (1)
  • 16K (1)
  • 32K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: None
Model card
DeepSeek-R1-Distill-Llama-70BReasoning, Text
View
  • 4K (1,2,4,8,16,32)
  • 8K (1,2,4,8)
  • 16K (1,2,4)
  • 32K (1,2,4)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: None
  • Import checkpoint: Yes
  • Optimizations: Speculative decoding
Model card
DeepSeek-V3-0324Text
View
  • 4K (4)
  • 8K (1)
  • 16K (1)
  • 32K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: None
Model card
DeepSeek-V3.1Reasoning, Text
View
  • 4K (4)
  • 8K (1)
  • 16K (1)
  • 32K (1)
View
  • Endpoint: Chat completions
  • Capabilities: Function calling, JSON mode
  • Import checkpoint: No
  • Optimizations: None
Model card
OpenAI
Whisper-Large-v3Audio
View
  • 4K (1,16,32)
View
  • Endpoint: Translation, Transcription
  • Capabilities: None
  • Import checkpoint: No
  • Optimizations: None
Model card
Qwen
Qwen3-32BReasoning, Text
View
  • 8K (1)
View
  • Endpoint: Chat completions
  • Capabilities: None
  • Import checkpoint: No
  • Optimizations: None
Model card
Tokyotech-llm
Llama-3.3-Swallow-70B-Instruct-v0.4Text
View
  • 4K (1,2,4,8,16)
  • 8K (1,2,4,8,16)
  • 16K (1,2,4)
  • 32K (1,2,4)
  • 64K (1)
  • 128K (1)
View
  • Endpoint: Chat completions
  • Capabilities: None
  • Import checkpoint: No
  • Optimizations: Speculative decoding
Model card
Other
E5-Mistral-7B-InstructEmbedding
View
  • 4K (1,2,4,8,16,32)
View
  • Endpoint: Embeddings
  • Capabilities: None
  • Import checkpoint: No
  • Optimizations: None
Model card