This guide covers creating custom bundles. For deploying pre-configured bundles provided by SambaNova, see Model Deployment.
Prerequisites
Before creating custom bundles, complete the following:Installation Prerequisites
Required system and environment setup
SambaStack Setup
Core cluster installation and configuration
Optional Configuration
Additional configuration as needed
- Model Deployment - Bundle deployment concepts and workflows
- SambaStack Models - Available model checkpoints
- Speculative Decoding Deployment Guidelines - Required if configuring speculative decoding pairs
See the SambaStack installation section for the minimum Helm version required to install SambaStack.
Terminology
| Term | Definition |
|---|---|
| RDU | Reconfigurable Dataflow Unit - SambaNova’s proprietary processor architecture |
| PEF | Processor Executable Format - Compiled model binaries that run on RDUs |
| Expert | A sequence length profile configuration (for example, 8k, 16k, 32k) within a model |
| Speculative Decoding | An optimization technique using a smaller draft model to accelerate inference from a larger target model |
| Legalizer | A validation process that verifies a bundle fits within RDU memory constraints |
Concepts
Bundle Architecture
SambaStack uses a three-tier architecture for model deployment:- BundleTemplate - Defines how models can be run: available sequence length profiles, batch sizes, and PEF mappings. Also defines target and draft model relationships for speculative decoding pairs.
- Bundle - Binds a template to specific checkpoints in storage, making it ready for deployment
- BundleDeployment - Instantiates one or more replicas of a bundle on the cluster
- Reuse templates across multiple checkpoints, including custom checkpoints for fine-tuned models
- Bundle different checkpoints of the same underlying model while sharing the same template
- Deploy the same bundle configuration with different replica counts
- Update checkpoints without modifying deployment configurations
BundleTemplate Structure
A BundleTemplate defines the deployment capabilities for one or more models. The following example shows a multi-model template with speculative decoding configuration:- gpt-oss-120b: 2 configs
- Meta-Llama-3.3-70B-Instruct: 5 configs, 3 of which use speculative decoding with Meta-Llama-3.1-8B-Instruct as the draft model
- Meta-Llama-3.1-8B-Instruct: 3 configs
BundleTemplate Top-Level Fields
| Field | Required | Description |
|---|---|---|
spec.models | Yes | Defines the models and their expert configurations. See Models and Experts. |
spec.owner | Yes | Email address of the bundle template owner for tracking and notifications. |
spec.secretNames | Yes | List of Kubernetes secrets used to access artifacts. Must match secrets configured in your environment. |
spec.usePefCRs | Yes | Set to true to use PEF custom resources for deployment. |
Models and Experts
Each model in a BundleTemplate contains one or more experts, which represent sequence length profiles. Common profiles include:8k,16k,32k,64k,128k- Fixed sequence length configurationsdefault- Standard configuration when no specific length is required
Expert Configuration Parameters
Each expert contains one or more configurations with the following parameters:| Parameter | Required | Description |
|---|---|---|
pef | Yes | Reference to a PEF custom resource in format <pef-name>:<version>. Use version 1 unless a higher version is confirmed via kubectl describe pef. The <pef-name> includes the batch size after the bs characters. |
spec_decoding | No | Speculative decoding configuration. Only specify for target models, not draft models. |
Speculative decoding parameters
Speculative decoding parameters
Parameters within
For detailed guidance, see Speculative Decoding Deployment Guidelines.
spec_decoding (target models only):| Parameter | Description |
|---|---|
draft_model | Name of the draft model in the same BundleTemplate |
draft_expert | Expert profile of the draft model to use. Should match the sequence length of the target model expert (for example, use a 16k draft expert with a 16k target expert). |
Bundle Structure
A Bundle binds a BundleTemplate to specific checkpoints. The following sections explain each part of the Bundle manifest.Resource Identity
Define the Bundle name and resource type:metadata.name- The Bundle name used to reference this bundle in deploymentsapiVersionandkind- Keep these values the same for all bundles
Checkpoints
Define the model checkpoints to use:| Field | Description |
|---|---|
source | GCS path pointing to the model checkpoint. Find available checkpoints in SambaStack Models. |
toolSupport | Boolean flag indicating whether this checkpoint is compatible with tools and function-calling (if supported by the product). |
Models
Map model names to checkpoints and templates:| Field | Description |
|---|---|
<model-key> (for example, Meta-Llama-3.3-70B-Instruct) | The API model name that users will send inference requests to. Must match a name in the BundleTemplate’s spec.models section. |
checkpoint | The checkpoint alias (from spec.checkpoints) this model should use. |
template | The model template in the BundleTemplate’s spec.models to use. This value must exactly match a model name defined under spec.models in the BundleTemplate. |
Template and Secrets
Connect the Bundle to its BundleTemplate and credentials:| Field | Description |
|---|---|
template | References the BundleTemplate by its metadata.name. This connects the Bundle to the deployment configurations defined in that template. |
secretNames | Credentials used to read checkpoints from GCS. Must match the secrets configured in your environment. |
Complete Bundle Example
The following example shows a complete multi-model Bundle:source fields above.
BundleDeployment Structure
A BundleDeployment instantiates a bundle on the cluster. For detailed deployment information, see SambaStack Setup.| Field | Description |
|---|---|
spec.bundle | Name of the Bundle to deploy |
spec.groups[].name | Name identifier for the deployment group |
spec.groups[].minReplicas | Minimum number of bundle replicas to maintain |
spec.groups[].qosList | Quality of service classes for request prioritization |
spec.owner | Email address of the deployment owner for tracking and notifications |
spec.secretNames | Credentials used to access artifacts. Must match secrets configured in your environment. |
Procedures
Identify Available PEFs
Before creating a BundleTemplate, identify the PEF resources available for your model.List available PEFs
List PEFs matching your model and sequence length requirements:Example:Output:
- Hosted
- On Premise
View PEF details
View PEF details to understand supported configurations and check for higher versions:Example output:Review the
- Hosted
- On Premise
Spec.Metadata section for:batch_size- Supported batch sizemax_seq_length- Maximum sequence lengthnum_rdus- Required RDU countrdu_arch- Required RDU architectureseq_lengths- Supported sequence lengths
Versions section to determine if a higher PEF version is available.Create a BundleTemplate
Create the YAML file
Create a YAML file for your BundleTemplate. For a single-model template:For multi-model templates with speculative decoding, see the BundleTemplate Structure example.
Create a Bundle
Create the YAML file
Deploy the Bundle
Update or Remove a Bundle/BundleTemplate
- Update a bundle
- Remove a bundle
Troubleshooting
Legalizer Validation Failures
| Error Pattern | Cause | Resolution |
|---|---|---|
PEF pef1 and pef2 are not checkpoint compatible (checkpoint #0) | PEFs with the same ckpt_sharing_uuid cannot share checkpoint memory | Assign different ckpt_sharing_uuid values to the incompatible PEFs |
Bundle exceeds memory constraints | Combined PEF and checkpoint size exceeds RDU memory | Reduce the number of experts or batch sizes in the template |
PEF not found: <pef-name> | Referenced PEF does not exist | Verify PEF name with kubectl get pefs |
Deployment Failures
| Symptom | Possible Cause | Resolution |
|---|---|---|
| Deployment stuck in pending | Insufficient RDU resources | Check cluster capacity; reduce minReplicas |
| Checkpoint download fails | Invalid GCS path or missing credentials | Verify source path; confirm sambanova-artifact-reader secret exists |
| Model not accessible via API | Model name mismatch | Verify spec.models.<name> matches expected API endpoint |
