Llama-3.3-70B-instruct and a Llama-3.1-8B-instruct, enabling you to switch between them almost instantly. Every configuration of the model in the bundle takes up space, so different bundles contain different sets of configurations - a copy ofLlama-3.3-70B-instructwith a batch size of 4 and a sequence length of 16k represents one configuration. You can choose the bundle that works best for you based on your use case. See the current list of supported bundles here.
A configuration defines the runtime settings for a model deployment: batch size, sequence length, and precision. A single deployment can include multiple configurations, enabling instant switching between setups for optimized performance.
Concepts
Bundle templates
A bundle template defines which models and configurations can be deployed together on a single node. Each bundle template contains one or more model templates, which specify the supported sequence lengths and batch sizes. Example model templates:| Model Template | Supported Sequence Lengths | Supported Batch Sizes |
|---|---|---|
DeepSeek-R1-0528-Template | 4k, 8k | 1, 2, 4, 8 |
DeepSeek-V3-0324-Template | 4k, 8k | 1, 2, 4, 8 |
deepseek-r1-v3-fp8-32k-Template combines both DeepSeek-R1-0528-Template and DeepSeek-V3-0324-Template into a single deployable package.
Bundles
A bundle associates trained checkpoints (model weights) with the model templates defined in a bundle template. For example, the bundledeepseek-r1-v3-fp8-32k links the R1-0528 and V3-0324 checkpoints to their corresponding model templates within deepseek-r1-v3-fp8-32k-Template.
Key terminology
| Term | Definition |
|---|---|
| Bundle | A deployable package combining models, checkpoints, and configurations |
| Expert | A model instance bound to a specific sequence length (e.g., DeepSeek-R1-Distill-Llama-70B-4K) |
| Expert config | An expert with a specific batch size (e.g., DeepSeek-R1-Distill-Llama-70B-4k-BS2) |
Configure bundles
Thebundles section in your sambastack.yaml configuration file has two subsections:
| Subsection | Purpose |
|---|---|
bundleSpecs | Declares which bundles are available for deployment. This registers bundles with the system but does not deploy them. |
bundleDeploymentSpecs | Defines how bundles are deployed across the cluster, including replica counts and QoS levels. |
See the SambaStack.yaml Reference for a full example.
Switch bundles
To switch to a different bundle, update bothbundleSpecs and bundleDeploymentSpecs with the new bundle name.
See the Supported Models & Bundles for available bundles. To request new bundle templates, contact SambaNova support.
Example:
See the SambaStack.yaml Reference for a full example.
Deploy multiple bundles
To deploy multiple bundles simultaneously, list each bundle in bothbundleSpecs and bundleDeploymentSpecs:
See the SambaStack.yaml Reference for a full example.
Verify deployment status
Check that pods reflect the updated bundle configuration:Deploy custom checkpoints
Custom checkpoints use the same deployment process as SambaNova-provided checkpoints but require conversion first. Prerequisites:- A converted checkpoint (any team member with appropriate access can perform the conversion)
- The Google Cloud Storage path to the converted checkpoint
Create a bundle configuration
Create a bundle configuration that references your custom checkpoint’s storage location:LLAMA3D2_1B_CKPT with the checkpoint key expected by your bundle template, and update the source path to your checkpoint location.
Deploy with speculative decoding
Speculative decoding uses a smaller draft model to predict tokens that a larger target model then verifies, potentially improving throughput.Requirements
- The draft checkpoint must be compatible with your target checkpoint
- For custom models, use a fine-tuned draft checkpoint that matches your target model’s domain
