Llama-3.3-70B-instruct and a Llama-3.1-8B-instruct, enabling you to switch between them almost instantly. Every configuration of the model in the bundle takes up space, so different bundles contain different sets of configurations - a copy of Llama-3.3-70B-instructwith a batch size of 4 and a sequence length of 16k represents one configuration. You can choose the bundle that works best for you based on your use case. See the current list of supported bundles here.
A configuration defines the runtime settings for a model deployment: batch size, sequence length, and precision. A single deployment can include multiple configurations, enabling instant switching between setups for optimized performance.
Concepts
Bundle templates
A bundle template defines which models and configurations can be deployed together on a single node. Each bundle template contains one or more model templates, which specify the supported sequence lengths and batch sizes. For a detailed example of how model templates are defined inside a BundleTemplate, see the Custom Bundle Deployment → BundleTemplate Structure. Example model templates:| Model Template | Supported Sequence Lengths | Supported Batch Sizes |
|---|---|---|
DeepSeek-R1-0528-Template | 4k, 8k | 1, 2, 4, 8 |
DeepSeek-V3-0324-Template | 4k, 8k | 1, 2, 4, 8 |
deepseek-r1-v3-fp8-32k-Template combines both DeepSeek-R1-0528-Template and DeepSeek-V3-0324-Template into a single deployable package.
For a full YAML example of a BundleTemplate definition, see the Custom Bundle Deployment → BundleTemplate Structure section.
Bundles
A bundle associates trained checkpoints (model weights) with the model templates defined in a bundle template. For example, the bundledeepseek-r1-v3-fp8-32k links the R1-0528 and V3-0324 checkpoints to their corresponding model templates within deepseek-r1-v3-fp8-32k-Template.
Key terminology
| Term | Definition |
|---|---|
| Bundle | A deployable package combining models, checkpoints, and configurations |
| Expert | A model instance bound to a specific sequence length (e.g., DeepSeek-R1-Distill-Llama-70B-4K) |
| Expert config | An expert with a specific batch size (e.g., DeepSeek-R1-Distill-Llama-70B-4k-BS2) |
Configure bundles
Thebundles section in your sambastack.yaml configuration file has two subsections:
| Subsection | Purpose |
|---|---|
bundleSpecs | Declares which bundles are available for deployment. This registers bundles with the system but does not deploy them. |
bundleDeploymentSpecs | Defines how bundles are deployed across the cluster, including replica counts and QoS levels. |
See the SambaStack.yaml Reference for a full example.
Switch bundles
To switch to a different bundle, update bothbundleSpecs and bundleDeploymentSpecs with the new bundle name.
See the Supported Models & Bundles for available bundles. To request new bundle templates, contact SambaNova support.
Example:
See the SambaStack.yaml Reference for a full example.
Deploy multiple bundles
To deploy multiple bundles simultaneously, list each bundle in bothbundleSpecs and bundleDeploymentSpecs:
See the SambaStack.yaml Reference for a full example.
Verify deployment status
Check that pods reflect the updated bundle configuration:Deploy custom checkpoints
Custom checkpoints use the same deployment process as SambaNova-provided checkpoints but require conversion first. Prerequisites:- A converted checkpoint (any team member with appropriate access can perform the conversion)
- The Google Cloud Storage path to the converted checkpoint
Create a bundle configuration
Create a bundle configuration that references your custom checkpoint’s storage location:LLAMA3D2_1B_CKPT with the checkpoint key expected by your bundle template, and update the source path to your checkpoint location.
The checkpoint keys used in this configuration must match the keys defined in the corresponding BundleTemplate. For an example of how these keys are defined, see the Custom Bundle Deployment → BundleTemplate Structure.
Deploy with speculative decoding
Speculative decoding uses a smaller draft model to predict tokens that a larger target model then verifies, potentially improving throughput.Requirements
- The draft checkpoint must be compatible with your target checkpoint
- For custom models, use a fine-tuned draft checkpoint that matches your target model’s domain
