Release notes for SambaStack, including new features, enhancements, and fixes.Documentation Index
Fetch the complete documentation index at: https://sambanova-systems.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
SambaStack v1.0.57 release
Release Date: April 30, 2026 This release introduces new model support (Gemma 3 12B, Gemma 3 27B, DeepSeek V3.2), high-throughput configurations for DeepSeek models, constrained decoding support for GPT-OSS, more accurate and informative bundle legalizer responses, and major API additions including the OpenAI Responses API, then and seed parameters, improved TTFT measurement, and OpenAI-conformant error responses.
For full deployment details, bundle configurations, and context length options for all models and bundles mentioned below, see Supported models and bundles.
New features and enhancements
New models Added support for the following models in SambaStack 1.0.57:- Gemma 3 12B (
gemma-3-12b-it) - image understandinggemma3-v3: 128K context, BS 2/4/6/8
- Gemma 3 27B (
gemma-3-27b-it) - image understandinggemma3-27b-32-128k: 32K and 128K context, BS 2/4/6/8
- Qwen3 235B (Qwen3-235B-A22B-Instruct-2507) - the legacy
Qwen3-235Bmodel name is preserved for backward compatibilitydyt-qwen3-235b-32-128k: 32K context (BS 2/4/6/8), 128K context (BS 2)qwen3-235b-16-32-64k: 16K, 32K, 64K contextqwen3-235b-128k: 128K context
- DeepSeek V3.2 - available in high-interactivity configurations (up to 128k context) and high-throughput configurations (up to 32k context)
- GPT-OSS 120B - adds constrained decoding capability; the previous standard DYT bundle is replaced by two bundles with constrained decoding support enabled
cd-dyt-gpt-oss-120b-32-64-128k: 32K, 64K, 128K contextcd-dyt-gpt-oss-120b-8-32-64-128k: 8K, 32K, 64K, 128K context
High-throughput configurations SambaStack 1.0.57 introduces high-throughput configuration options for running on SambaRack nodes.
- Optimized for large-scale serving of a single model, prioritizing total system throughput over per-user latency to support high volumes of concurrent users
- Suited for use cases that do not require low end-to-end latency or interactivity
- Requires a minimum of 4 SambaRack nodes dedicated to this configuration
- Cannot be bundled with other models
- Transparent to users - no API or client code changes required
- Uses a disaggregated prefill-decode architecture with a configurable node ratio - for example, 3 nodes running prefill and 1 running decode
decode_queue_time- time requests wait in the decode queue before processing begins (new in 1.0.57)time_to_first_token- latency from request receipt to first output tokencompletion_tokens_per_sec- decode throughput
Constrained decoding (structured output) Added constrained decode mask sampling on a per-token schedule. Models that declare
supports_constrained_decoding in their PEF CRs can now use structured output generation with JSON schema enforcement.
Set
constrained_decoding: true in the BundleTemplate to enable this feature for the bundle.TTFT measurement improvement
process_request() processing time is now included in TTFT and end-to-end latency measurements for more accurate reporting.
Bundle legalizer: accuracy and validation improvements Improved legalizer accuracy for memory accounting; bundles that exceed available memory or host segment size are now rejected at validation time rather than failing at runtime.
Bundle memory utilization in bundle CR status The Bundle CR status now includes a
legalizerInfo block with memory utilization data from the bundle legalizer. Use kubectl get bundle <bundle-name> -o yaml to inspect the block.
| Field | Description |
|---|---|
status | Legalizer validation result: passed, failed, or skipped |
errors | List of validation errors (present when status: failed) |
warnings | Non-fatal warnings (present when status: passed with warnings) |
ddr | DDR memory utilization ratio |
hbm_resident | HBM resident memory utilization; can exceed 1.0 under over-allocation |
host | Host memory utilization ratio |
status field is absent if the legalizer output could not be processed. skipped only appears when skip_legalizer: true is set on a PEF CR.
PEF and checkpoint lifecycle status PEF CRs now include a
pef_status field, and checkpoint CR version entries include a checkpoint_status field, giving operators visibility into artifact lifecycle state without needing to inspect pod logs.
Status values for both fields:
preview- newly available configurations which may not be fully tested and / or may not have full feature support; not recommended for production use casesstable- well-tested, production-ready configurationsdeprecated- still functional; scheduled for removal in a future releaseremoved- no longer available; must be replaced before deployment
pef_status when reviewing PEF CR versions before deploying a bundle to confirm the version is stable.
API improvements and fixes
Enhancements to improve OpenAI compatibility and new API capabilities across thechat/completions endpoint and a new responses endpoint.
Responses API
The OpenAI-compatible Responses API (POST /v1/responses) is now supported.
- Currently supported model:
gpt-oss-120b - Supported capabilities: text generation (streaming and non-streaming), function calling (2-step), structured output (JSON schema), multi-turn conversations (client-managed state), and reasoning output
- Uses stateless request semantics - conversation history is passed by the client on each request
reasoning_tokens in usage response
Reasoning models now include a reasoning_tokens field in the usage object, reporting the number of tokens consumed by the model’s internal reasoning step.
tool_choice support for GPT-OSS 120B
tool_choice is now supported for gpt-oss-120b. Accepted values: auto, none, required, and {"type": "function", "function": {"name": "..."}}.
- Only
gpt-oss-120bsupportstool_choicein Release 1.0.57 - The forced function call format follows the Chat Completions API structure - the inner
"function"key is required; this differs from the Responses API format which omits it allowed_toolsis not supported
tool_choice is supported only for gpt-oss-120b bundles with constrained decoding enabled: cd-dyt-gpt-oss-120b-32-64-128k and cd-dyt-gpt-oss-120b-8-32-64-128k.n parameter - multiple completions
The n parameter is now supported in chat completions.
- Valid range: 1–16 (default: 1)
- Implemented via API-level decomposition -
nparallel single-completion requests are issued and combined before returning to the client - Not supported when
toolsorfunctionsare present in the request
seed parameter
The seed parameter is now supported for reproducible outputs.
- Accepts any integer, including negative values
- Applies to text generation models only - not supported for multi-modal or continuous batching models
system_fingerprintis not returned (unlike OpenAI)
OpenAI-conformant error responses All API error responses now use the OpenAI-standard error format. A new top-level
request_id field is included in every error response.
request_id to SambaNova support when reporting an issue.
Structured output parser fixes Fixed bugs in structured output that produced incorrect output for certain schema patterns.
Better error message for tool call truncation Improved the error message when a tool call exceeds the maximum token length.
Empty streaming chunk removed Removed a spurious empty chunk emitted during streaming responses, improving conformance with OpenAI streaming behavior.
Whisper rate limit error code fix Fixed the Whisper transcription endpoint to return the correct 429 status code when the rate limit is exceeded.
Bug fixes
PEF cache fix Fixed a bug with PEF cache when migrating Bundles to use PEF CRsAuth provider validation fix Fixed a Helm auth provider validation issue that incorrectly rejected custom secrets when Keycloak was also enabled.
CVE dependency updates Updated multiple dependency versions across the inference operator, global model router, and supporting libraries to address known CVEs.
Known issues
Gemma 3: function calling not supported Gemma 3 12B and 27B do not support native function calling. ThetoolSupport: true flag in bundles using Gemma models indicates support for JSON output schema, rather than general native function calling support.
- Impact: Requests using the
toolsparameter withgemma-3-12b-itwill not produce function call outputs. - Workaround: Function calling behavior can be approximated by implementing tool-use logic via user prompts.
gemma-3-27b-it may experience intermittent errors.
- Impact: Vision requests to
gemma-3-27b-itat 104k–112k token context lengths may experience intermittent HTTP 524 timeout errors with elevated latency. Text and function calling modes are not affected. - Workaround: Avoid vision requests in the 104k–112k context window range for now. We are looking in to a fix.
SambaStack v0.5.17 release
Release Date: April 8, 2026 This release introduces support for SambaRack SN40L-16 configuration with 4TB of DDR memory.New features and enhancements
Cluster-level memory management Adds support for declaring cluster-wide DDR memory limits for SambaRack SN40L-16 via an environment variable insambastack.yaml, enforced by the bundle validation tool.
- The default memory limit is 12TB. Update this value to support SambaRack SN40L-16 with 4TB of DDR.
- Set the
DDR_PER_RDU_GBenvironment variable: default is768(12TB per-node), set to256for 4TB per-node. For more details, see the SambaStack.yaml reference - The memory limit applies to all SambaRack nodes in the cluster.
- The bundle validation tool enforces the limit at runtime. Configurations exceeding the limit fail with an informative error and must be refactored by removing model configurations.
Known issues
Inventory check shows degraded status for 4TB memory configurations Runningsnfadm inventory shows “degraded” status for RDUs that are in nodes with 4TB memory configurations.
- Impact: Cosmetic only. Does not affect operation.
- Resolution: Expected to be resolved in a future release.
SambaStack v0.5.14 release
Release Date: April 1, 2026 This release introduces simplified checkpoint discovery, new model support (MiniMax-M2.5, Agentic RAG bundle), enhanced installation verification tools, and multiple API enhancements for improved OpenAI compatibility.For full deployment details, bundle configurations, and context length options for all models and bundles mentioned below, see Supported models and bundles.
New features and enhancements
Checkpoint path discovery via model CRs Checkpoint paths are now discoverable through the Model Custom Resource (CR), eliminating the need for customers to manually locate checkpoint paths in configuration files. The following Kubernetes command can be used to view Model CRs, which now contain checkpoint paths:- Model CRs now include checkpoint path information for all supported models.
- Supports multiple checkpoints for different model configurations.
- Backwards compatible with existing bundle configurations - checkpoint paths in bundle CRs override Model CR paths if specified.
- Works with on-prem and air-gapped deployments.
- Models can be discovered using
kubectl -n <namespace> get models. - Bundles can be discovered using
kubectl -n <namespace> get bundles.
us-agentic-rag-1-1 bundle, a multi-model bundle optimized for retrieval-augmented generation (RAG) workflows. It contains the following model configs:
gpt-oss-120b- Seq Length: 32K, BS: 4
- Seq Length: 64K, BS: 2
- Seq Length: 128K, BS: 2
Llama-4-Maverick-17B-128E-Instruct- Seq Length: 8K, BS: 1
- Seq Length: 16K, BS: 1
Meta-Llama-3.3-70B(Target) /Meta-Llama-3.2-1B(Draft)- Seq Length: 4K, BS: 1, 4, 8, 16, 32
- Seq Length: 8K, BS: 1, 4, 8
- Seq Length: 16K, BS: 1, 4
- Seq Length: 32K, BS: 1, 4
- Seq Length: 64K, BS: 1
- Seq Length: 128K, BS: 1
Meta-Llama-3.1-8B-Instruct- Seq Length: 4K, BS: 1, 4, 16, 32
- Seq Length: 8K, BS: 1, 4, 16, 32
- Seq Length: 16K, BS: 1, 4, 8
E5-Mistral-7B-Instruct- Seq Length: 4K, BS: 1, 4, 8, 16, 32
- Checkpoint accessible via your artifact reader service account.
- Customers can include MiniMax-M2.5 in bundles that pass bundle validation and deploy successfully.
- Includes reasoning support.
- Pre-install script: Validates all hardware, connectivity, and software prerequisites.
- Post-install script: Confirms all SambaStack components are installed and running correctly.
- Clear pass/fail reporting with actionable guidance on failures.
- Scripts maintained and validated against the current SambaStack release.
- Distributed via the sambastack-tools public GitHub repository with README instructions.
- Queue depths can now be configured per context length group using
contextGroupsin the Service Tier configuration. - Queue depth controls how many concurrent requests can be queued for a model configuration.
- Lower queue depths for higher context lengths help prevent memory exhaustion and improve overall service stability.
- SambaStack now validates queue depth configuration at request time. Misconfigured models with missing queue depth definitions will surface a clear error instead of failing silently.
- The empty string
""incontextLengthsmatches requests to the base model name without a context length suffix (e.g.,DeepSeek-R1-0528). Requests with explicit suffixes like-8kor-128kmatch their correspondingcontextLengthsvalues.
Context length suffixes (8k, 16k, 32k, etc.) are case-sensitive. Use lowercase
k in all configurations. This applies to all models supported by SambaNova.contextGroups field is a sub-component of a model grouping within a service tier. Example configuration:
substitutions field has moved from bundles to global in the SambaStack Helm chart.
This is a breaking change that affects air-gapped and NFS customers. Update your Helm values file before upgrading.
API improvements and fixes
Enhancements to improve OpenAI compatibility across thechat/completions endpoint and a new, non-standard feature to track usage in streaming chunks.
Text object support in user message content
- Expanded support for text objects in content arrays, matching OpenAI
ChatCompletionsContentPartTextspecification. - Enabled for: DeepSeek-R1-0528, Llama-3.3-Swallow-70B-Instruct-v0.4, and MiniMax-M2.5.
- Added
logprobsfield that, when set totrue, returns the log probabilities for each generated token. - Added
top_logprobsfield that, when set to an integern, returns the topnlog probabilities for each generation.
- Added a non-standard feature to allow users to obtain partial usage statistics in chunks returned in streaming responses.
- This feature is enabled by setting
STREAM_USAGE_IN_CHUNKS: truein the replica group section of your custom bundle deployment.
tool_choice: noneensures that the model will not see available tools.
- The
chat/completionsendpoint now rejects invalid message roles. Onlyuser,assistant,system, andtoolare accepted.
- The Whisper transcription endpoint now returns descriptive error messages when audio file processing fails, instead of a bare HTTP 400 status code.
Bug fixes
Function calling routing fix- Fixed an issue where function calling routing did not apply the model name prefix check correctly, causing some models to skip tool routing.
- Fixed the air-gap inventory to include the correct
cloudnative-pgimage configuration, preventing missing image errors during offline installation.
Known issues
SambaRack Manager does not support 2 PDU configurations SambaRack Manager does not currently support configurations with 2 PDUs. Customers using 2 PDU setups should contact SambaNova Support for guidance on alternative configurations.SambaStack v0.4.8 release
Release Date: March 10, 2026 This release introduces air-gapped deployment support, custom checkpoint management with NFS storage, swappable model configurations, and multiple API enhancements for improved OpenAI compatibility.New features and enhancements
SambaStack air-gapped support Added support for air-gapped mode of operation, enabling secure, isolated deployments.- Install, upgrade, and setup for air-gapped configurations is performed in conjunction with SambaNova support.
- Ongoing administration (Auth, User Management, Custom DB) is designed for self-service and follows the same workflows as on-prem deployments.
Install, setup, port forwarding to access Keycloak UI, and upgrade steps are not documented for air-gapped deployments due to varying customer network configurations. Please work with SambaNova support for these workflows.
- By default, all models in bundles can be swapped out of HBM and replaced with other models in DDR memory.
- Use the
swappable: <boolean>field in the bundle YAML definition to enable or disable this behavior. - Default value is
true. When set tofalse, the model remains in HBM and cannot be swapped out, ensuring zero switching time for requests to that model.
API improvements and fixes
Enhancements to improve OpenAI compatibility across thechat/completions endpoint.
Text object support in user message content
- Added support for text objects in content arrays, matching OpenAI
ChatCompletionsContentPartTextspecification. - Enabled for: gpt-oss-120b, DeepSeek-V3.1, DeepSeek-V3.1-Terminus, DeepSeek-V3.2, DeepSeek-V3-0324, Qwen3-32B, Qwen3-235B.
- Fixed an issue where
response_format=textwould throw an error. - The endpoint now supports all OpenAI formats:
text,json_object,json_schema.
- Expanded
temperaturerange from 0.0–1.0 to 0.0–2.0, matching OpenAI specification.
- Tools with number-type arguments were always returned as floats.
- Now integers are preserved as integers, matching JSON Schema number specification.
- Added
parallel_tool_callsparameter support. - When set to
false, the model will make at most one tool call per response, matching OpenAI specification.
- Added support for token usage reporting in each chunk of stream.
Known issues
- Parallel Tool Calls with Constrained Decoding.
- The following models return
nullforlogprobseven whenlogprobs=trueortop_logprobsis set. The parameters are accepted without error but have no effect:- Llama-4-Maverick-17B-128E-Instruct
- Whisper-Large-v3
SambaStack initial release
Release Date: September 19, 2025 This release introduces the comprehensive SambaStack documentation suite.New features and enhancements
SambaStack guide Added the SambaStack Guide, providing step-by-step instructions for deploying, configuring, and managing SambaStack.- Setup, installation, and environment configuration.
- User and authentication management (Keycloak, OIDC).
- Monitoring, logging, and artifact management.
- Bundle and model deployment workflows.
- Common command reference.
- Lists all supported models (e.g., Llama 3.3, Llama 4 Maverick, DeepSeek).
- Shows context length, batch size options, and supported features.
- Instructions for using the Model list API to check availability in your environment.

