SambaStack Release Notes - SambaNova Documentation

Release notes for SambaStack, including new features, enhancements, and fixes.

SambaStack v0.5.14 Release

Release Date: April 1, 2026 This release introduces simplified checkpoint discovery, new model support (MiniMax-M2.5, Agentic RAG bundle), enhanced installation verification tools, and multiple API enhancements for improved OpenAI compatibility.

For full deployment details, bundle configurations, and context length options for all models and bundles mentioned below, see Supported models and bundles.

New Features and Enhancements

Checkpoint Path Discovery via Model CRs

Checkpoint paths are now discoverable through the Model Custom Resource (CR), eliminating the need for customers to manually locate checkpoint paths in configuration files. The following Kubernetes command can be used to view Model CRs, which now contain checkpoint paths:

kubectl -n <namespace> describe model <model-name>

For example:

kubectl describe model minimax-m2.5

Model CRs now include checkpoint path information for all supported models.
Supports multiple checkpoints for different model configurations.
Backwards compatible with existing bundle configurations - checkpoint paths in bundle CRs override Model CR paths if specified.
Works with on-prem and air-gapped deployments.

Improved Discovery of Available Models and Bundles

All available models and bundles are now applied automatically by the SambaStack helm chart. No additional configuration is required.

Models can be discovered using kubectl -n <namespace> get models.
Bundles can be discovered using kubectl -n <namespace> get bundles.

Agentic RAG Bundle

Added the us-agentic-rag-1-1 bundle, a multi-model bundle optimized for retrieval-augmented generation (RAG) workflows. It contains the following model configs:

gpt-oss-120b
- Seq Length: 32K, BS: 4
- Seq Length: 64K, BS: 2
- Seq Length: 128K, BS: 2
Llama-4-Maverick-17B-128E-Instruct
- Seq Length: 8K, BS: 1
- Seq Length: 16K, BS: 1
Meta-Llama-3.3-70B (Target) / Meta-Llama-3.2-1B (Draft)
- Seq Length: 4K, BS: 1, 4, 8, 16, 32
- Seq Length: 8K, BS: 1, 4, 8
- Seq Length: 16K, BS: 1, 4
- Seq Length: 32K, BS: 1, 4
- Seq Length: 64K, BS: 1
- Seq Length: 128K, BS: 1
Meta-Llama-3.1-8B-Instruct
- Seq Length: 4K, BS: 1, 4, 16, 32
- Seq Length: 8K, BS: 1, 4, 16, 32
- Seq Length: 16K, BS: 1, 4, 8
E5-Mistral-7B-Instruct
- Seq Length: 4K, BS: 1, 4, 8, 16, 32

MiniMax-M2.5 Model Support

Added support for MiniMax-M2.5 model on SambaStack.

Checkpoint accessible via your artifact reader service account.
Customers can include MiniMax-M2.5 in bundles that pass bundle validation and deploy successfully.
Includes reasoning support.

Pre/Post-Install Verification Scripts

New verification scripts help customers validate their SambaStack environment before and after installation.

Pre-install script: Validates all hardware, connectivity, and software prerequisites.
Post-install script: Confirms all SambaStack components are installed and running correctly.
Clear pass/fail reporting with actionable guidance on failures.
Scripts maintained and validated against the current SambaStack release.
Distributed via the sambastack-tools public GitHub repository with README instructions.

Per-Model Queue Depth Configuration

Added support for configuring different queue depths based on context length, enabling optimized uptime for high-context-length requests.

Queue depths can now be configured per context length group using contextGroups in the bundle configuration.
Queue depth controls how many concurrent requests can be queued for a model configuration.
Lower queue depths for higher context lengths help prevent memory exhaustion and improve overall service stability.
SambaStack now validates queue depth configuration at request time. Misconfigured models with missing queue depth definitions will surface a clear error instead of failing silently.
The empty string "" in contextLengths matches requests to the base model name without a context length suffix (e.g., DeepSeek-R1-0528). Requests with explicit suffixes like -8k or -128k match their corresponding contextLengths values.

Context length suffixes (8k, 16k, 32k, etc.) are case-sensitive. Use lowercase k in all configurations. This applies to all models supported by SambaNova.

The contextGroups field is a sub-component of a model grouping within a service tier. Example configuration:

free:
  - models:
      - DeepSeek-V3-0324
      - DeepSeek-R1-0528
    queueDepth: 10
    qos: free
    rates:
      - allowedRequests: 100
        periodSeconds: 60
example_service_tier:
  inherits: free
  overrides:
    - models:
        - DeepSeek-V3-0324
      queueDepth: 10
      qos: example
      rates:
        - allowedRequests: 100
          periodSeconds: 60
    - models:
        - DeepSeek-R1-0528
      queueDepth: 10
      qos: example
      contextGroups:
        - contextLengths: ["", "8k"]
          queueDepth: 2
        - contextLengths: ["16k", "32k", "64k", "128k"]
          queueDepth: 1
      rates:
        - allowedRequests: 100
          periodSeconds: 60

Helm Chart Configuration Update

The substitutions field has moved from bundles to global in the SambaStack Helm chart.

This is a breaking change that affects air-gapped and NFS customers. Update your Helm values file before upgrading.

Before:

bundles:
  substitutions:
    gs://<SAMBASTACK_ARTIFACTS_BUCKET>: nfs:///nfsdata

After:

global:
  substitutions:
    gs://<SAMBASTACK_ARTIFACTS_BUCKET>: nfs:///nfsdata

API Improvements and Fixes

Enhancements to improve OpenAI compatibility across the chat/completions endpoint and a new, non-standard feature to track usage in streaming chunks.

Text Object Support in User Message Content

Expanded support for text objects in content arrays, matching OpenAI ChatCompletionsContentPartText specification.
Enabled for: DeepSeek-R1-0528, Llama-3.3-Swallow-70B-Instruct-v0.4, and MiniMax-M2.5.

Log Probabilities

Added logprobs field that, when set to true, returns the log probabilities for each generated token.
Added top_logprobs field that, when set to an integer n, returns the top n log probabilities for each generation.

Usage in Streaming Chunks

Added a non-standard feature to allow users to obtain partial usage statistics in chunks returned in streaming responses.
This feature is enabled by setting STREAM_USAGE_IN_CHUNKS: true in the replica group section of your custom bundle deployment.

Tool Choices

Supported only for models that support function calling.

tool_choice: none ensures that the model will not see available tools.

Invalid Message Role Validation

The chat/completions endpoint now rejects invalid message roles. Only user, assistant, system, and tool are accepted.

Whisper Audio Error Response

The Whisper transcription endpoint now returns descriptive error messages when audio file processing fails, instead of a bare HTTP 400 status code.

Bug Fixes

Function Calling Routing Fix

Fixed an issue where function calling routing did not apply the model name prefix check correctly, causing some models to skip tool routing.

Air-Gap Inventory Fix

Fixed the air-gap inventory to include the correct cloudnative-pg image configuration, preventing missing image errors during offline installation.

Known Issues

SambaRack Manager Does Not Support 2 PDU Configurations

SambaRack Manager does not currently support configurations with 2 PDUs. Customers using 2 PDU setups should contact SambaNova Support for guidance on alternative configurations.

SambaStack v0.4.8 Release

Release Date: March 10, 2026 This release introduces air-gapped deployment support, custom checkpoint management with NFS storage, swappable model configurations, and multiple API enhancements for improved OpenAI compatibility.

New Features and Enhancements

SambaStack Air-gapped Support

Added support for air-gapped mode of operation, enabling secure, isolated deployments.

Install, upgrade, and setup for air-gapped configurations is performed in conjunction with SambaNova support.
Ongoing administration (Auth, User Management, Custom DB) is designed for self-service and follows the same workflows as on-prem deployments.

Install, setup, port forwarding to access Keycloak UI, and upgrade steps are not documented for air-gapped deployments due to varying customer network configurations. Please work with SambaNova support for these workflows.

Custom Checkpoints with NFS Storage

Added the ability to reference custom checkpoints from customer-provided NFS storage in deployments.

Swappable Models in Bundles

Added configurable model swapping behavior to optimize high-bandwidth memory (HBM) utilization.

By default, all models in bundles can be swapped out of HBM and replaced with other models in DDR memory.
Use the swappable: <boolean> field in the bundle YAML definition to enable or disable this behavior.
Default value is true. When set to false, the model remains in HBM and cannot be swapped out, ensuring zero switching time for requests to that model.

API Improvements and Fixes

Enhancements to improve OpenAI compatibility across the chat/completions endpoint.

Text Object Support in User Message Content

Added support for text objects in content arrays, matching OpenAI ChatCompletionsContentPartText specification.
Enabled for: gpt-oss-120b, DeepSeek-V3.1, DeepSeek-V3.1-Terminus, DeepSeek-V3.2, DeepSeek-V3-0324, Qwen3-32B, Qwen3-235B.

Response Format Text Option

Fixed an issue where response_format=text would throw an error.
The endpoint now supports all OpenAI formats: text, json_object, json_schema.

Extended Temperature Range

Expanded temperature range from 0.0–1.0 to 0.0–2.0, matching OpenAI specification.

Tool Calling Number Type Fix

Tools with number-type arguments were always returned as floats.
Now integers are preserved as integers, matching JSON Schema number specification.

Parallel Tool Calls Support

Added parallel_tool_calls parameter support.
When set to false, the model will make at most one tool call per response, matching OpenAI specification.

Streaming Token Usage Reporting

Added support for token usage reporting in each chunk of stream.

Known Issues

Parallel Tool Calls with Constrained Decoding.
The following models return null for logprobs even when logprobs=true or top_logprobs is set. The parameters are accepted without error but have no effect:
- Llama-4-Maverick-17B-128E-Instruct
- Whisper-Large-v3

SambaStack Initial Release

Release Date: September 19, 2025 This release introduces the comprehensive SambaStack documentation suite.

New Features and Enhancements

SambaStack Guide

Added the SambaStack Guide, providing step-by-step instructions for deploying, configuring, and managing SambaStack.

Setup, installation, and environment configuration.
User and authentication management (Keycloak, OIDC).
Monitoring, logging, and artifact management.
Bundle and model deployment workflows.
Common command reference.

SambaStack Models

Added the SambaStack models and bundles page to help customers understand which models are available on SambaStack and how to configure them.

Lists all supported models (e.g., Llama 3.3, Llama 4 Maverick, DeepSeek).
Shows context length, batch size options, and supported features.
Instructions for using the Model list API to check availability in your environment.

Release notes

​SambaStack v0.5.14 Release

​New Features and Enhancements

​Checkpoint Path Discovery via Model CRs

​Improved Discovery of Available Models and Bundles

​Agentic RAG Bundle

​MiniMax-M2.5 Model Support

​Pre/Post-Install Verification Scripts

​Per-Model Queue Depth Configuration

​Helm Chart Configuration Update

​API Improvements and Fixes

​Text Object Support in User Message Content

​Log Probabilities

​Usage in Streaming Chunks

​Tool Choices

​Invalid Message Role Validation

​Whisper Audio Error Response

​Bug Fixes

​Function Calling Routing Fix

​Air-Gap Inventory Fix

​Known Issues

​SambaRack Manager Does Not Support 2 PDU Configurations

​SambaStack v0.4.8 Release

​New Features and Enhancements

​SambaStack Air-gapped Support

​Custom Checkpoints with NFS Storage

​Swappable Models in Bundles

​API Improvements and Fixes

​Text Object Support in User Message Content

​Response Format Text Option

​Extended Temperature Range

​Tool Calling Number Type Fix

​Parallel Tool Calls Support

​Streaming Token Usage Reporting

​Known Issues

​SambaStack Initial Release

​New Features and Enhancements

​SambaStack Guide

​SambaStack Models

SambaStack v0.5.14 Release

New Features and Enhancements

Checkpoint Path Discovery via Model CRs

Improved Discovery of Available Models and Bundles

Agentic RAG Bundle

MiniMax-M2.5 Model Support

Pre/Post-Install Verification Scripts

Per-Model Queue Depth Configuration

Helm Chart Configuration Update

API Improvements and Fixes

Text Object Support in User Message Content

Log Probabilities

Usage in Streaming Chunks

Tool Choices

Invalid Message Role Validation

Whisper Audio Error Response

Bug Fixes

Function Calling Routing Fix

Air-Gap Inventory Fix

Known Issues

SambaRack Manager Does Not Support 2 PDU Configurations

SambaStack v0.4.8 Release

New Features and Enhancements

SambaStack Air-gapped Support

Custom Checkpoints with NFS Storage

Swappable Models in Bundles

API Improvements and Fixes

Text Object Support in User Message Content

Response Format Text Option

Extended Temperature Range

Tool Calling Number Type Fix

Parallel Tool Calls Support

Streaming Token Usage Reporting

Known Issues

SambaStack Initial Release

New Features and Enhancements

SambaStack Guide

SambaStack Models