This guide describes how to create custom BundleTemplates and Bundles to deploy models with specific configurations on SambaStack.
In SambaStack, bundles are the fundamental deployment unit. Rather than deploying individual models, you deploy bundles that group one or more models together with their deployment configurations, including batch sizes and sequence lengths. This approach uses the SambaNova Reconfigurable Dataflow Unit (RDU) to support multiple models and configurations in a single deployment, enabling instant switching between configurations for improved efficiency and flexibility.
This guide covers creating custom bundles. For deploying pre-configured bundles provided by SambaNova, see Deploying Bundles .
Prerequisites
Before creating custom bundles, complete the following that applies to you:
Quickstart - Hosted System set up for hosted SambaStack
Quickstart - On-prem System set up for On-prem Sambastack
Additionally, review the following documentation:
Terminology
Term Definition RDU Reconfigurable Dataflow Unit - SambaNova’s proprietary processor architecture PEF Processor Executable Format - Compiled model binaries that run on RDUs Expert A sequence length profile configuration (for example, 8k, 16k, 32k) within a model Speculative Decoding An optimization technique using a smaller draft model to accelerate inference from a larger target model Legalizer A validation process that verifies a bundle fits within RDU memory constraints
Concepts
Bundle architecture
For an overview of core bundle concepts including what bundles and bundle templates are see Deploying Model Bundles . This section covers the specific three-tier structure used when creating custom bundles:
BundleTemplate - Defines how models can be run: available sequence length profiles, batch sizes, and PEF mappings. Also defines target and draft model relationships for speculative decoding pairs.
Bundle - Binds a template to specific checkpoints in storage, making it ready for deployment
BundleDeployment - Instantiates one or more replicas of a bundle on the cluster
This separation allows you to:
Reuse templates across multiple checkpoints, including custom checkpoints for fine-tuned models
Bundle different checkpoints of the same underlying model while sharing the same template
Deploy the same bundle configuration with different replica counts
Update checkpoints without modifying deployment configurations
BundleTemplate structure
A BundleTemplate defines the deployment capabilities for one or more models. The following example shows a multi-model template with speculative decoding configuration:
apiVersion : sambanova.ai/v1alpha1
kind : BundleTemplate
metadata :
name : bt-gpt120-llama70sd8-llama8
spec :
models :
gpt-oss-120b :
experts :
8k :
configs :
- pef : gpt-oss-fp8-ss8192-bs2:1
32k :
configs :
- pef : gpt-oss-fp8-ss32768-bs2:1
Meta-Llama-3.3-70B-Instruct :
experts :
4k :
configs :
- pef : llama-3p1-70b-ss4096-bs4-sd5:3
spec_decoding :
draft_model : Meta-Llama-3.1-8B-Instruct
- pef : llama-3p1-70b-ss4096-bs32-sd5:2
8k :
configs :
- pef : llama-3p1-70b-ss8192-bs1-sd5:1
- pef : llama-3p1-70b-ss8192-bs8-sd5:2
default_config_values :
spec_decoding :
draft_model : Meta-Llama-3.1-8B-Instruct
128k :
configs :
- pef : llama-3p1-70b-ss131072-bs1-sd5:2
Meta-Llama-3.1-8B-Instruct :
experts :
4k :
configs :
- pef : llama-3p1-8b-ss4096-bs4:1
8k :
configs :
- pef : llama-3p1-8b-ss8192-bs1:1
- pef : llama-3p1-8b-ss8192-bs8:1
owner : no-reply@sambanova.ai
secretNames :
- sambanova-artifact-reader
usePefCRs : true
The example above includes configurations for three models:
gpt-oss-120b : 2 configs
Meta-Llama-3.3-70B-Instruct : 5 configs, 3 of which use speculative decoding with Meta-Llama-3.1-8B-Instruct as the draft model
Meta-Llama-3.1-8B-Instruct : 3 configs
For more details on the speculative decoding fields, see the Speculative Decoding Deployment Guidelines .
BundleTemplate Top-Level Fields
Field Required Description spec.modelsYes Defines the models and their expert configurations. See Models and Experts . spec.ownerYes Email address of the bundle template owner for tracking and notifications. spec.secretNamesYes List of Kubernetes secrets used to access artifacts. Must match secrets configured in your environment. spec.usePefCRsYes Set to true to use PEF custom resources for deployment.
Models and Experts
Each model in a BundleTemplate contains one or more experts , which represent sequence length profiles. Common profiles include:
8k, 16k, 32k, 64k, 128k - Fixed sequence length configurations
default - Standard configuration when no specific length is required
spec :
models :
Meta-Llama-3.1-8B-Instruct :
experts :
128k :
configs :
- <config>
64k :
configs :
- <config>
32k :
configs :
- <config>
16k :
configs :
- <config>
8k :
configs :
- <config>
default :
configs :
- <config>
Expert Configuration Parameters
Each expert contains one or more configurations with the following parameters:
Parameter Required Description pefYes Reference to a PEF custom resource in format <pef-name>:<version>. Use version 1 unless a higher version is confirmed via kubectl describe pef. The <pef-name> includes the batch size after the bs characters. spec_decodingNo Speculative decoding configuration. Only specify for target models, not draft models.
Speculative decoding parameters
Parameters within spec_decoding (target models only): Parameter Description draft_modelName of the draft model in the same BundleTemplate draft_expertExpert profile of the draft model to use. Should match the sequence length of the target model expert (for example, use a 16k draft expert with a 16k target expert).
For detailed guidance, see Speculative Decoding Deployment Guidelines .
Bundle structure
A Bundle binds a BundleTemplate to specific checkpoints. The following sections explain each part of the Bundle manifest.
Resource Identity
Define the Bundle name and resource type:
apiVersion : sambanova.ai/v1alpha1
kind : Bundle
metadata :
name : b-70b-3dot3-ss-16-32k-bs-4
metadata.name - The Bundle name used to reference this bundle in deployments
apiVersion and kind - Keep these values the same for all bundles
Checkpoints
Define the model checkpoints to use:
spec :
checkpoints :
GPT_OSS_120B_CKPT :
source : gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
toolSupport : true
META_LLAMA_3_3_70B_INSTRUCT_CKPT :
source : gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
toolSupport : true
META_LLAMA_3_1_8B_INSTRUCT_CKPT :
source : gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
toolSupport : true
Field Description sourceGCS path pointing to the model checkpoint. Find available checkpoints in Model & Bundle Directory . toolSupportBoolean flag indicating whether this checkpoint is compatible with tools and function-calling (if supported by the product).
Models
Map model names to checkpoints and templates:
spec :
models :
gpt-oss-120b :
checkpoint : GPT_OSS_120B_CKPT
template : gpt-oss-120b
Meta-Llama-3.3-70B-Instruct :
checkpoint : META_LLAMA_3_3_70B_INSTRUCT_CKPT
template : Meta-Llama-3.3-70B-Instruct
Meta-Llama-3.1-8B-Instruct :
checkpoint : META_LLAMA_3_1_8B_INSTRUCT_CKPT
template : Meta-Llama-3.1-8B-Instruct
Field Description <model-key> (for example, Meta-Llama-3.3-70B-Instruct)The API model name that users will send inference requests to. Must match a name in the BundleTemplate’s spec.models section. checkpointThe checkpoint alias (from spec.checkpoints) this model should use. templateThe model template in the BundleTemplate’s spec.models to use. This value must exactly match a model name defined under spec.models in the BundleTemplate.
Template and Secrets
Connect the Bundle to its BundleTemplate and credentials:
spec :
template : bt-gpt120-llama70sd8-llama8
secretNames :
- sambanova-artifact-reader
Field Description templateReferences the BundleTemplate by its metadata.name. This connects the Bundle to the deployment configurations defined in that template. secretNamesCredentials used to read checkpoints from GCS. Must match the secrets configured in your environment.
Complete Bundle Example
The following example shows a complete multi-model Bundle:
apiVersion : sambanova.ai/v1alpha1
kind : Bundle
metadata :
name : b-gpt120-llama70sd8-llama8
spec :
checkpoints :
GPT_OSS_120B_CKPT :
source : gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
toolSupport : true
META_LLAMA_3_3_70B_INSTRUCT_CKPT :
source : gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
toolSupport : true
META_LLAMA_3_1_8B_INSTRUCT_CKPT :
source : gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
toolSupport : true
models :
gpt-oss-120b :
checkpoint : GPT_OSS_120B_CKPT
template : gpt-oss-120b
Meta-Llama-3.3-70B-Instruct :
checkpoint : META_LLAMA_3_3_70B_INSTRUCT_CKPT
template : Meta-Llama-3.3-70B-Instruct
Meta-Llama-3.1-8B-Instruct :
checkpoint : META_LLAMA_3_1_8B_INSTRUCT_CKPT
template : Meta-Llama-3.1-8B-Instruct
secretNames :
- sambanova-artifact-reader
template : bt-gpt120-llama70sd8-llama8
The paths to checkpoints hosted by SambaNova will be provided to you by your SambaNova contact. If you have hosted your own checkpoints, you can include those paths in the source fields above.
BundleDeployment structure
A BundleDeployment instantiates a bundle on the cluster. For detailed deployment information, see Quickstart - Hosted or Quickstart - On-prem .
apiVersion : sambanova.ai/v1alpha1
kind : BundleDeployment
metadata :
name : bd-gpt120-llama70sd8-llama8
spec :
bundle : b-gpt120-llama70sd8-llama8
groups :
- minReplicas : 1
name : default
qosList :
- free
owner : no-reply@sambanova.ai
secretNames :
- sambanova-artifact-reader
Field Description spec.bundleName of the Bundle to deploy spec.groups[].nameName identifier for the deployment group spec.groups[].minReplicasMinimum number of bundle replicas to maintain spec.groups[].qosListQuality of service classes for request prioritization spec.ownerEmail address of the deployment owner for tracking and notifications spec.secretNamesCredentials used to access artifacts. Must match secrets configured in your environment.
The following procedures describe the step-by-step workflow for creating and deploying custom bundles using the concepts and structures described above.
Procedures
Identify available PEFs
Before creating a BundleTemplate, identify the PEF resources available for your model.
List available PEFs
List PEFs matching your model and sequence length requirements: kubectl get pefs | grep < model-patter n >
Example: kubectl get pefs | grep llama-3p1-70b-ss4096
kubectl -n < namespac e > get pefs.sambanova.ai | grep < model-patter n >
Example: kubectl -n < namespac e > get pefs.sambanova.ai | grep llama-3p1-70b-ss4096
Output: llama-3p1-70b-ss4096-bs1-sd9 17h
llama-3p1-70b-ss4096-bs16-sd5 17h
llama-3p1-70b-ss4096-bs2-sd5 17h
llama-3p1-70b-ss4096-bs32-sd5 17h
llama-3p1-70b-ss4096-bs4-sd5 17h
llama-3p1-70b-ss4096-bs8-sd5 17h
View PEF details
View PEF details to understand supported configurations and check for higher versions: kubectl describe pef < pef-nam e >
Example output: $ kubectl describe pef deepseek-ss131072-bs1
Name: deepseek-ss131072-bs1
...
Spec:
copy_pef: gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/copy_pef
copy_pef_name_override: COPY_PEF_DEEPSEEK_R1_128K_PEF_BS1
Metadata:
batch_size: 1
is_prompt_caching: false
job_type: infer
max_completion_tokens: 131072
max_seq_length: 131072
num_rdus: 16
rdu_arch: SN40L-16
seq_lengths:
65536
131072
model_arch: deepseek
pef_name_override: DEEPSEEK_R1_128K_PEF_BS1
Versions:
...
kubectl -n < namespac e > describe pef.sambanova.ai < pef-nam e >
Example output: $ kubectl -n <namespace> describe pef.sambanova.ai deepseek-ss131072-bs1
Name: deepseek-ss131072-bs1
...
Spec:
copy_pef: gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/copy_pef
copy_pef_name_override: COPY_PEF_DEEPSEEK_R1_128K_PEF_BS1
Metadata:
batch_size: 1
is_prompt_caching: false
job_type: infer
max_completion_tokens: 131072
max_seq_length: 131072
num_rdus: 16
rdu_arch: SN40L-16
seq_lengths:
65536
131072
model_arch: deepseek
pef_name_override: DEEPSEEK_R1_128K_PEF_BS1
Versions:
...
Review the Spec.Metadata section for:
batch_size - Supported batch size
max_seq_length - Maximum sequence length
num_rdus - Required RDU count
rdu_arch - Required RDU architecture
seq_lengths - Supported sequence lengths
Check the Versions section to determine if a higher PEF version is available.
Use version 1 in your PEF references (for example, llama-3p1-70b-ss16384-bs4-sd5:1) unless kubectl describe pef confirms a higher version is available.
Create a BundleTemplate
Create the YAML file
Create a YAML file for your BundleTemplate. For a single-model template: apiVersion : sambanova.ai/v1alpha1
kind : BundleTemplate
metadata :
name : bt-gpt120
spec :
models :
gpt-oss-120b :
experts :
8k :
configs :
- pef : gpt-oss-fp8-ss8192-bs2:1
32k :
configs :
- pef : gpt-oss-fp8-ss32768-bs2:1
64k :
configs :
- pef : gpt-oss-fp8-ss65536-bs2:1
128k :
configs :
- pef : gpt-oss-fp8-ss131072-bs2:1
owner : no-reply@sambanova.ai
secretNames :
- sambanova-artifact-reader
usePefCRs : true
For multi-model templates with speculative decoding, see the BundleTemplate Structure example.
Apply the BundleTemplate
kubectl apply -f < bundletemplate-fil e > .yaml
kubectl -n < namespac e > apply -f < bundletemplate-fil e > .yaml
Verify creation
kubectl get bundletemplates
kubectl -n < namespac e > get bundletemplates.sambanova.ai < bundle-nam e >
Including multiple batch sizes for each expert allows the inference engine to select the smallest and fastest configuration based on current workload.
Create a bundle
Create the YAML file
Create a YAML file for your Bundle: apiVersion : sambanova.ai/v1alpha1
kind : Bundle
metadata :
name : b-gpt120
spec :
checkpoints :
GPT_OSS_120B_CKPT :
source : gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
toolSupport : true
models :
gpt-oss-120b :
checkpoint : GPT_OSS_120B_CKPT
template : gpt-oss-120b
secretNames :
- sambanova-artifact-reader
template : bt-gpt120
For multi-model bundles, see the Complete Bundle Example .
Apply the Bundle
kubectl apply -f < bundle-fil e > .yaml
kubectl -n < namespac e > apply -f < bundle-fil e > .yaml
Verify legalizer validation
The legalizer automatically runs when you apply the bundle and validates whether the bundle fits in RDU memory. kubectl describe bundle < bundle-nam e >
kubectl -n < namespac e > describe bundle.sambanova.ai < bundle-nam e >
Successful validation
Failed validation
Status:
Conditions:
Last Transition Time: 2025-12-22T21:11:05.689262+00:00
Message: Bundle is Valid
Observed Generation: 1
Reason: ValidationSucceeded
Status: True
Type: Valid
Status:
Conditions:
Last Transition Time: 2025-12-22T21:11:54.975311+00:00
Message: <error-details>
Reason: ValidationFailed
Status: False
Type: Valid
The Message field contains error details, including legalizer errors if any.
Do not proceed to deployment until the bundle shows ValidationSucceeded.
Deploy the bundle
Create a BundleDeployment
apiVersion : sambanova.ai/v1alpha1
kind : BundleDeployment
metadata :
name : bd-gpt120
spec :
bundle : b-gpt120
groups :
- minReplicas : 1
name : default
qosList :
- free
owner : no-reply@sambanova.ai
secretNames :
- sambanova-artifact-reader
Apply the BundleDeployment
kubectl apply -f < bundledeployment-fil e > .yaml
kubectl -n < namespac e > apply -f < bundledeployment-fil e > .yaml
Monitor deployment status
kubectl get bundledeployments
kubectl describe bundledeployment < deployment-nam e >
kubectl -n < namespac e > get bundledeployments.sambanova.ai
kubectl -n < namespac e > describe bundledeployment.sambanova.ai < deployment-nam e >
Update or remove a bundle/BundleTemplate
Update a bundle
Remove a bundle
Modify the YAML file
Edit the Bundle or BundleTemplate YAML file with your changes.
Reapply the configuration
kubectl apply -f < modified-fil e > .yaml
kubectl -n < namespac e > apply -f < modified-fil e > .yaml
The legalizer automatically revalidates the changes.
Delete the BundleDeployment
kubectl delete bundledeployment < deployment-nam e >
kubectl -n < namespac e > delete bundledeployment < deployment-nam e >
Delete the Bundle
kubectl delete bundle < bundle-nam e >
kubectl -n < namespac e > delete bundle < bundle-nam e >
Delete the BundleTemplate (optional)
kubectl delete bundletemplate < template-nam e >
kubectl -n < namespac e > delete bundletemplate < template-nam e >
Troubleshooting
Legalizer validation failures
Error Pattern Cause Resolution PEF pef1 and pef2 are not checkpoint compatible (checkpoint #0)PEFs with the same ckpt_sharing_uuid cannot share checkpoint memory Assign different ckpt_sharing_uuid values to the incompatible PEFs Bundle exceeds memory constraintsCombined PEF and checkpoint size exceeds RDU memory Reduce the number of experts or batch sizes in the template PEF not found: <pef-name>Referenced PEF does not exist Verify PEF name with kubectl get pefs
Deployment failures
Symptom Possible Cause Resolution Deployment stuck in pending Insufficient RDU resources Check cluster capacity; reduce minReplicas Checkpoint download fails Invalid GCS path or missing credentials Verify source path; confirm sambanova-artifact-reader secret exists Model not accessible via API Model name mismatch Verify spec.models.<name> matches expected API endpoint
Model Deployment Bundle deployment concepts and workflows
Supported Models & Bundles Catalogue of models and bundles available for deployment
Deploying Custom Checkpoints Deploy your own custom or fine-tuned checkpoints
Checkpoint Conversion Tool Convert Checkpoints to Compatible formats