In SambaStack, deployments use bundles that group one or more models together rather than deploying individual models. A bundle groups one or more models together with their deployment configurations, such as batch sizes, sequence lengths, and precision settings. For example, deploying the model Llama-3.3-70B with a batch size of 4 and a sequence length of 16k is one specific configuration. A bundle may contain multiple such configurations, potentially spanning different models. SambaNova’s RDU technology supports loading multiple models and configurations in a single deployment, allowing instant switching among configurations. Unlike traditional GPU systems that deploy a single static model, SambaStack’s multi-model, multi-configuration bundles increase efficiency, flexibility, and throughput without sacrificing latency. Bundles facilitate flexible deployments by combining models and their configurations.
In SambaStack, a configuration defines the specific runtime settings for a model deployment, such as batch size, sequence length, and precision. Deployments can include multiple configurations, enabling instant switching between different model setups within the same deployment for optimized performance and flexibility.

Bundle template

A Bundle Template defines which models and configurations can be deployed together on a single node. It contains one or more Model Templates. Each Model Template specifies supported configurations: combinations of sequence lengths and batch sizes.

Example - Model template

Model TemplateSupported Sequence LengthsSupported Batch Sizes
DeepSeek-R1-0528-Template8k, 4k1, 2, 4, 8
DeepSeek-V3-0324-Template8k, 4k1, 2, 4, 8

Example - Bundle template

  • deepseek-r1-v3-fp8-32k-Template bundles both DeepSeek-R1-0528-Template and DeepSeek-V3-0324-Template into one deployment package.

Bundle

A Bundle associates actual checkpoints (trained weights) with model templates defined in a bundle template. For example, the Bundle named deepseek-r1-v3-fp8-32k links the R1-0528 checkpoint and the V3-0324 checkpoint to their corresponding Model Templates within the deepseek-r1-v3-fp8-32k-Template.

Key terms

  • Bundle: A ready-to-deploy package combining models and configuration.
  • Expert: A model instance tied to a specific sequence length (e.g., DeepSeek-R1-Distill-Llama-70B-4K).
  • Expert Config: An Expert instance with a specific batch size and configuration (e.g., DeepSeek-R1-Distill-Llama-70B-4k-BS2).

Integrating Bundles into Sambastack deployment

Inside your SambaStack configuration file, the bundles section serves two key purposes:
  1. Defining Available Bundles: The bundleSpecs subsection declares which Bundle Templates and Bundles are available for deployment. This registers bundles with the system but does not deploy them.
  2. Deploying Bundles to Nodes: The bundleDeploymentSpecs subsection defines how and where bundles are deployed within the cluster, including the number of replicas and QoS levels they serve.

Example - Bundle configuration


bundles:
  bundleSpecs:
    - name: llama-4-medium

  bundleDeploymentSpecs:
    - name: llama-4-medium
      groups:
        - name: "default"
          minReplicas: 1
          qosList:
            - "web"
            - "free"

Switching bundles

To switch deployments to a different bundle:
  • To switch deployments, update the bundleSpecs and bundleDeploymentSpecs sections with the new bundle name.
Available bundles can be found in the Models page. To request new bundle templates, contact the SambaNova team. Example:
bundles:
  bundleSpecs:
    - name: qwen3-32b-whisper

  bundleDeploymentSpecs:
    - name: qwen3-32b-whisper
      groups:
        - name: "default"
          minReplicas: 1
          qosList:
            - "web"
            - "free"
Apply your changes by running:
kubectl apply -f sambastack.yaml
Success is confirmed by the message:
configmap/sambastack configured

Deploying multiple bundles

You may deploy more than one bundle simultaneously by listing them in both bundleSpecs and bundleDeploymentSpecs. Example:
bundles:
  bundleSpecs:
    - name: llama-4-medium
    - name: qwen3-32b-whisper

  bundleDeploymentSpecs:
    - name: llama-4-medium
      groups:
        - name: "default"
          minReplicas: 1
          qosList:
            - "web"
            - "free"

    - name: qwen3-32b-whisper
      groups:
        - name: "default"
          minReplicas: 1
          qosList:
            - "web"
            - "free"
SambaStack currently supports deploying only one bundle per node. When deploying multiple bundles, ensure each bundle is assigned to separate nodes to avoid resource conflicts.

Querying deployment status

To verify that pods have been updated with the new bundles, run:
kubectl get pods

Deploying custom checkpoints

Custom checkpoints follow the same deployment process as standard SambaNova checkpoints but require prior conversion.
  • Anyone in your organization (developers or admins) can convert a checkpoint.
  • Obtain a converted checkpoint pointer or perform the conversion yourself.
  • See the Custom Checkpoint Deployment Guide for full instructions.

Creating bundle config for custom checkpoints

Copy an existing Bundle template and modify the source field to point to your checkpoint’s Google Storage location. Example:
kind: Bundle
spec:
  checkpoints:
    <CKPT_KEY>:
      source: gs://pointer/to/your/converted/checkpoint

Sample bundle instance

apiVersion: sambanova.ai/v1alpha1
kind: Bundle
metadata:
  name: 70b-3dot3-ss-4-8-64-128k
spec:
  checkpoints:
    LLAMA3D2_1B_CKPT:
      source: gs://pointer/to/your/converted/checkpoint

Speculative decoding deployment guidelines

To deploy speculative decoding effectively and maintain stable performance in your SambaStack environment:
  • Ensure that the draft checkpoint matches your custom target checkpoint to avoid performance degradation.
  • Use a Bundle Template without speculative decoding if you lack a fine-tuned draft checkpoint for your custom model.
  • Validate draft–target checkpoint compatibility with the SN Conversion Library to optimize acceleration.
  • Refer to the Speculative Decoding Background and Custom Checkpoint Deployment guides for detailed instructions.

Deploying custom draft checkpoints

Custom draft checkpoints can be specified alongside target checkpoints within a Bundle to enable speculative decoding.

Example - Bundle with draft and target checkpoints

apiVersion: sambanova.ai/v1alpha1
kind: Bundle
metadata:
  name: 70b-3dot3-ss-4-8-64-128k
spec:
  checkpoints:
    LLAMA3D2_1B_CKPT:
      source: gs://pointer/to/converted/draft/checkpoint
    LLAMA3_70B_3_3_CKPT:
      source: gs://pointer/to/converted/target/checkpoint

Best practices

  • Speculative decoding does not affect the accuracy of response outputs but can impact performance based on acceptance rates.
  • Avoid using speculative decoding bundles for custom models unless you have a properly tuned draft checkpoint to prevent degraded performance.