Llama-3.3-70B
with a batch size of 4 and a sequence length of 16k is one specific configuration. A bundle may contain multiple such configurations, potentially spanning different models.
SambaNova’s RDU technology supports loading multiple models and configurations in a single deployment, allowing instant switching among configurations. Unlike traditional GPU systems that deploy a single static model, SambaStack’s multi-model, multi-configuration bundles increase efficiency, flexibility, and throughput without sacrificing latency. Bundles facilitate flexible deployments by combining models and their configurations.
In SambaStack, a configuration defines the specific runtime settings for a model deployment, such as batch size, sequence length, and precision. Deployments can include multiple configurations, enabling instant switching between different model setups within the same deployment for optimized performance and flexibility.
Bundle template
A Bundle Template defines which models and configurations can be deployed together on a single node. It contains one or more Model Templates. Each Model Template specifies supported configurations: combinations of sequence lengths and batch sizes.Example - Model template
Model Template | Supported Sequence Lengths | Supported Batch Sizes |
---|---|---|
DeepSeek-R1-0528-Template | 8k, 4k | 1, 2, 4, 8 |
DeepSeek-V3-0324-Template | 8k, 4k | 1, 2, 4, 8 |
Example - Bundle template
deepseek-r1-v3-fp8-32k-Template
bundles bothDeepSeek-R1-0528-Template
andDeepSeek-V3-0324-Template
into one deployment package.
Bundle
A Bundle associates actual checkpoints (trained weights) with model templates defined in a bundle template. For example, the Bundle nameddeepseek-r1-v3-fp8-32k
links the R1-0528 checkpoint and the V3-0324 checkpoint to their corresponding Model Templates within the deepseek-r1-v3-fp8-32k-Template
.
Key terms
- Bundle: A ready-to-deploy package combining models and configuration.
- Expert: A model instance tied to a specific sequence length (e.g.,
DeepSeek-R1-Distill-Llama-70B-4K
). - Expert Config: An Expert instance with a specific batch size and configuration (e.g.,
DeepSeek-R1-Distill-Llama-70B-4k-BS2
).
Integrating Bundles into Sambastack deployment
Inside your SambaStack configuration file, the bundles section serves two key purposes:- Defining Available Bundles: The
bundleSpecs
subsection declares which Bundle Templates and Bundles are available for deployment. This registers bundles with the system but does not deploy them. - Deploying Bundles to Nodes: The
bundleDeploymentSpecs
subsection defines how and where bundles are deployed within the cluster, including the number of replicas and QoS levels they serve.
Example - Bundle configuration
Switching bundles
To switch deployments to a different bundle:- To switch deployments, update the
bundleSpecs
andbundleDeploymentSpecs
sections with the new bundle name.
Deploying multiple bundles
You may deploy more than one bundle simultaneously by listing them in bothbundleSpecs
and bundleDeploymentSpecs
.
Example:
SambaStack currently supports deploying only one bundle per node. When deploying multiple bundles, ensure each bundle is assigned to separate nodes to avoid resource conflicts.
Querying deployment status
To verify that pods have been updated with the new bundles, run:Deploying custom checkpoints
Custom checkpoints follow the same deployment process as standard SambaNova checkpoints but require prior conversion.- Anyone in your organization (developers or admins) can convert a checkpoint.
- Obtain a converted checkpoint pointer or perform the conversion yourself.
- See the Custom Checkpoint Deployment Guide for full instructions.
Creating bundle config for custom checkpoints
Copy an existing Bundle template and modify the source field to point to your checkpoint’s Google Storage location. Example:Sample bundle instance
Speculative decoding deployment guidelines
To deploy speculative decoding effectively and maintain stable performance in your SambaStack environment:- Ensure that the draft checkpoint matches your custom target checkpoint to avoid performance degradation.
- Use a Bundle Template without speculative decoding if you lack a fine-tuned draft checkpoint for your custom model.
- Validate draft–target checkpoint compatibility with the SN Conversion Library to optimize acceleration.
- Refer to the Speculative Decoding Background and Custom Checkpoint Deployment guides for detailed instructions.
Deploying custom draft checkpoints
Custom draft checkpoints can be specified alongside target checkpoints within a Bundle to enable speculative decoding.Example - Bundle with draft and target checkpoints
Best practices
- Speculative decoding does not affect the accuracy of response outputs but can impact performance based on acceptance rates.
- Avoid using speculative decoding bundles for custom models unless you have a properly tuned draft checkpoint to prevent degraded performance.