SambaStack serves models by deploying bundles—packaged groups of one or more models with their deployment configurations, including batch sizes, sequence lengths, and precision settings. This allows many models of interest to be served simultaneously, enabling instant switching between them without reloading weights. This approach increases efficiency, flexibility, and throughput compared to traditional GPU deployments that load a single static model. For example, a single bundle can have aDocumentation Index
Fetch the complete documentation index at: https://sambanova-systems.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
Llama-3.3-70B-instruct and a Llama-3.1-8B-instruct, enabling you to switch between them almost instantly. Every configuration of the model in the bundle takes up space, so different bundles contain different sets of configurations - a copy of Llama-3.3-70B-instructwith a batch size of 4 and a sequence length of 16k represents one configuration. You can choose the bundle that works best for you based on your use case. See the supported models and bundles page for the current list.
A configuration defines the runtime settings for a model deployment: batch size, sequence length, and precision. A single deployment can include multiple configurations, enabling instant switching between setups for optimized performance.
Concepts
Bundle templates
A bundle template defines which models and configurations can be deployed together on a single node. Each bundle template contains one or more model templates, which specify the supported sequence lengths and batch sizes. For a detailed example of how model templates are defined inside a BundleTemplate, see the Custom Bundle Deployment → BundleTemplate Structure. Example model templates:| Model Template | Supported Sequence Lengths | Supported Batch Sizes |
|---|---|---|
DeepSeek-R1-0528-Template | 4k, 8k | 1, 2, 4, 8 |
DeepSeek-V3-0324-Template | 4k, 8k | 1, 2, 4, 8 |
deepseek-r1-v3-fp8-32k-Template combines both DeepSeek-R1-0528-Template and DeepSeek-V3-0324-Template into a single deployable package.
For a full YAML example of a BundleTemplate definition, see the Custom Bundle Deployment → BundleTemplate Structure section.
Bundles
A bundle associates trained checkpoints (model weights) with the model templates defined in a bundle template. For example, the bundledeepseek-r1-v3-fp8-32k links the R1-0528 and V3-0324 checkpoints to their corresponding model templates within deepseek-r1-v3-fp8-32k-Template.
Key terminology
| Term | Definition |
|---|---|
| Bundle | A deployable package combining models, checkpoints, and configurations |
| Expert | A model instance bound to a specific sequence length (e.g., DeepSeek-R1-Distill-Llama-70B-4K) |
| Expert config | An expert with a specific batch size (e.g., DeepSeek-R1-Distill-Llama-70B-4k-BS2) |
Configure bundles
Thebundles section in your sambastack.yaml configuration file has two subsections:
| Subsection | Purpose |
|---|---|
bundleSpecs | Declares which bundles are available for deployment. This registers bundles with the system but does not deploy them. |
bundleDeploymentSpecs | Defines how bundles are deployed across the cluster, including replica counts and QoS levels. |
See the SambaStack.yaml Reference for a full example.
Switch bundles
To switch to a different bundle, update bothbundleSpecs and bundleDeploymentSpecs with the new bundle name.
See the Supported Models and Bundles for available bundles. To request new bundle templates, contact SambaNova support.
Example:
See the SambaStack.yaml Reference for a full example.
Deploy multiple bundles
To deploy multiple bundles simultaneously, list each bundle in bothbundleSpecs and bundleDeploymentSpecs:
See the SambaStack.yaml Reference for a full example.
Verify deployment status
Check that pods reflect the updated bundle configuration:status.legalizerInfo section shows legalizer results including resource utilization:
| Field | Description |
|---|---|
status.legalizerInfo.status | Human-readable legalizer result: Legalizer passed, Legalizer failed, or Legalizer was skipped; absent if legalizer output could not be processed |
status.legalizerInfo.errors | List of validation errors from the legalizer |
status.legalizerInfo.warnings | List of validation warnings from the legalizer |
status.legalizerInfo.utilization.ddr | DDR memory utilization as a quoted string (e.g., '0.75' represents 75%) |
status.legalizerInfo.utilization.hbm_resident | HBM resident memory utilization as a quoted string (e.g., '0.65' represents 65%; values above '1.0' indicate over-allocation) |
status.legalizerInfo.utilization.host | Host memory utilization as a quoted string (e.g., '0.30' represents 30%) |
The
utilization fields show N/A when skip_legalizer: true is set on a bundle that has been legalized at least once. If the bundle has never been legalized (for example, skip_legalizer: true was set from initial deployment), legalizerInfo is absent entirely. The utilization field may also be absent if the legalizer output could not be parsed.Deploy custom checkpoints
Custom checkpoints use the same deployment process as SambaNova-provided checkpoints but require conversion first. Prerequisites:- A converted checkpoint (any team member with appropriate access can perform the conversion)
- The Google Cloud Storage path to the converted checkpoint
Create a bundle configuration
Create a bundle configuration that references your custom checkpoint’s storage location:LLAMA3D2_1B_CKPT with the checkpoint key expected by your bundle template, and update the source path to your checkpoint location.
The checkpoint keys used in this configuration must match the keys defined in the corresponding BundleTemplate. For an example of how these keys are defined, see the Custom Bundle Deployment → BundleTemplate Structure.
Deploy with speculative decoding
Speculative decoding uses a smaller draft model to predict tokens that a larger target model then verifies, potentially improving throughput.Requirements
- The draft checkpoint must be compatible with your target checkpoint
- For custom models, use a fine-tuned draft checkpoint that matches your target model’s domain

