Skip to main content
Release notes for SambaCloud, including new features, enhancements, and fixes.

October 2025

Release Date: October 7, 2025

New Features and Enhancements

API Reference Documentation Upgrade

Upgraded the API Reference documentation with a fully automated, Swagger-style interface that dynamically syncs with the OpenAPI specification.
  • Interactive, auto-generated API documentation with full model definitions.
  • Real-time updates to stay aligned with the latest OpenAPI spec.
  • Up-to-date usage examples in Python and TypeScript, powered by the official SambaNova SDKs.

June 2025

Release Date: June 17, 2025

New Features and Enhancements

Japanese Language Support

Added Japanese language support for SambaNova documentation.

May 2025

Release Date: May 28, 2025

New Features and Enhancements

AWS Marketplace Integration

SambaCloud is now available as a SaaS offering on AWS Marketplace.
  • Subscribe using your AWS account and streamline billing through AWS.
  • PrivateLink support for secure, low-latency connections between your AWS VPC and SambaCloud—no public internet exposure.
  • Fast onboarding with models like Llama 4, DeepSeek-R1 671B, and Whisper.
  • Up to 10x faster inference vs. GPUs with SambaNova’s RDU architecture.
  • Privacy-first architecture: SambaNova never stores your data.
  • Support for fine-tuned model deployment without code changes.
See AWS Marketplace integration document for more information.

DeepSeek-V3-0324 Function Calling Support

Release Date: May 6, 2025 Added function calling support for DeepSeek-V3-0324, enabling more dynamic and programmable interactions by allowing you to invoke external functions directly through the model’s outputs. See Function calling for more information.

April 2025

New Features and Enhancements

New Models

  • Qwen3-32B (April 29, 2025)
    • Added as a Preview model. High-capacity, multilingual LLM with strong performance across question answering, summarization, reasoning, and coding.
    • Available via Playground and API.
  • Whisper-Large-V3 (April 18, 2025)
    • OpenAI’s latest large-scale automatic speech recognition (ASR) model with enhanced transcription accuracy, improved multilingual support, and better robustness against noisy audio.
    • Available via API.
  • Llama-4-Maverick-17B-128E-Instruct (April 9, 2025)
    • 400B parameter mixture-of-experts model with 17B active parameters and 128 experts.
    • Added as a Preview model. Competitive with Gemma 3, Gemini 2.0 Flash, and Mistral 3.1.
    • Available via Playground and API.
  • Llama-4-Scout-17B-16E-Instruct (April 7, 2025)
    • 109B parameter mixture-of-experts model with 17B active parameters and 16 experts.
    • Added as a Preview model. Competitive with Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1.
    • Available via Playground and API.

Llama-4-Maverick Image Input Support

Release Date: April 16, 2025 Llama-4-Maverick-17B-128E-Instruct now supports image input functionality with up to two images as context alongside text prompts. Available via Playground and API.

Deprecations

Release Date: April 14, 2025 The following models have been deprecated and are scheduled for removal from active endpoints:
  • Llama-3.1-Swallow-70B-Instruct-v0.3
  • Llama-3.1-Tulu-3-405B
  • Llama-3.2-11B-Vision-Instruct
  • Llama-3.2-90B-Vision-Instruct
  • Meta-Llama-3.1-70B-Instruct
  • Qwen2.5-72B-Instruct
  • Qwen2.5-Coder-32B-Instruct
See Model Deprecations for guidance on alternatives.

March 2025

New Features and Enhancements

New Models

  • DeepSeek-V3-0324 (March 27, 2025)
    • First open-source non-reasoning model that outperforms proprietary non-reasoning models.
    • Major boost in reasoning performance, stronger front-end development skills, and smarter tool-use capabilities.
    • Added as a Preview model. Available via Playground and API.
  • E5-Mistral-7B-Instruct (March 20, 2025)
  • QwQ-32B (March 6, 2025)
    • State-of-the-art reasoning model from Alibaba Qwen team.
    • Delivers comparable performance to DeepSeek-R1’s 671B with far fewer parameters.
    • Integrates advanced agent-related features for critical thinking and external tool usage.
  • Llama-3.1-Swallow-8B-Instruct-v0.3 and Llama-3.1-Swallow-70B-Instruct-v0.3 (March 5, 2025)
    • Japanese-optimized language models developed through continual pre-training of Meta’s Llama 3.1.
    • Trained on 200B tokens from diverse sources including web corpora, technical content, and multilingual Wikipedia.

DeepSeek-R1 Production Release

Release Date: March 18, 2025 DeepSeek-R1 transitioned from preview to production model with increased context length to 16k tokens. Available via Playground and API.

API Improvements and Fixes

Model List Endpoint

Release Date: March 18, 2025 Added the model list endpoint that provides information about currently available models in SambaCloud.

Context Length Increases

  • DeepSeek-R1-Distill-Llama-70B (March 13, 2025): Increased to 128k tokens.

February 2025

New Features and Enhancements

New Models

  • DeepSeek-R1 (February 13, 2025)
    • 671B parameter MoE model with performance comparable to OpenAI’s o1 across mathematics, coding, and reasoning.
    • Added as a Preview model. SambaCloud delivers the fastest DeepSeek-R1 deployment in the world.
  • Llama-3.1-Tulu-3-405B (February 4, 2025)
    • Open-source model from Allen Institute for AI (AI2) that performs better than DeepSeek-V3.
    • Trained using Reinforcement Learning with Verifiable Rewards (RLVR).
    • Competitive with or superior to GPT-4o and DeepSeek-V3, with notable advantage in safety benchmarks.
  • DeepSeek-R1-Distill-Llama-70B (January 30, 2025)
    • Fine-tuned on Llama 3.3 70B using samples generated by DeepSeek-R1.
    • Outperforms GPT-4o, o1-mini, and Claude-3.5-Sonnet across AIME, MATH-500, GPQA, and LiveCodeBench.

Context Length Increases

  • Llama-3.3-70B (February 25, 2025): Increased to 128k tokens. Available as production model.
  • DeepSeek-R1 (February 21, 2025): Increased to 8k tokens. Available as preview model.
For API access and higher rate limits for DeepSeek-R1, please complete this form to join the waitlist.

December 2024

New Features and Enhancements

New Models

  • Llama 3.3 70B (December 11, 2024)
    • Delivers comparable performance to Llama 3.1 405B.
    • Competes closely with OpenAI’s GPT-4o and Google’s Gemini Pro 1.5.
  • QwQ 32B Preview (December 11, 2024)
    • Experimental reasoning model from Alibaba’s Qwen team with 32.5B parameters.
    • Excels in mathematics and programming: 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, 50.0% on LiveCodeBench.
  • Qwen2.5 72B (December 5, 2024)
    • 72B-parameter model trained on 18T tokens. Supports context lengths up to 128k tokens.
    • Supports 29+ languages including English, Chinese, French, and Spanish.
  • Qwen2.5 Coder 32B (December 5, 2024)
    • 32B-parameter model tailored for code-related tasks across 92 programming languages.
    • HumanEval score of 92.7%, matching GPT-4o coding capability.
  • Llama Guard 3 8B (December 5, 2024)
    • Fine-tuned for content safety classification, aligning with 14 MLCommons standardized hazards taxonomy.

Context Length Increases

  • Llama 3.2 1B: Increased from 4k to 16k.
  • Llama 3.1 70B: Increased from 64k to 128k.
  • Llama 3.1 405B: Increased from 8k to 16k.

October 2024

New Features and Enhancements

New Models

  • Llama 3.2 11B and 90B (October 29, 2024)
    • Multimodality support for text and image inputs.
  • Llama 3.2 1B and 3B (October 1, 2024)
    • Available to all tiers at the fastest inference speed.

Function Calling

Release Date: October 29, 2024 Added function calling API enabling dynamic, agentic workflows by allowing the model to suggest and select function calls based on user input.

Multimodality in API and Playground

Release Date: October 29, 2024 Interact with multimodal models directly through the Inference API (OpenAI compatible) and Playground for seamless text and image processing.

Python and Gradio Code Samples

Release Date: October 29, 2024 New code samples for faster prototyping and reduced setup time.

API Improvements and Fixes

Automatic Sequence Length Routing

Release Date: October 10, 2024 Automatic routing based on sequence length—no need to change model names to specify different sequence lengths.

Context Length Increases

  • Llama 3.1 8B (October 10, 2024): Increased from 8k to 16k.
  • Llama 3.1 70B (October 10, 2024): Increased from 8k to 64k.

Performance Improvements

Improved performance for Llama 3.2 1B and 3B models.

User Experience Improvements

  • How to Use API guide with example curl code for text and image inputs.
  • Streamlined access to updated code snippets.
  • New Clear Chat option in Playground.
  • New UI components with tool tips.

Updated AI Starter Kits

  • Multimodal Retriever: Chart, image, and figure understanding with advanced retrieval combining visual and textual data.
  • Llama-3.1-Instruct-o1: Enhanced reasoning with Llama-3.1-405B, hosted on Hugging Face Spaces.

September 2024

Release Date: September 10, 2024

New Features and Enhancements

SambaCloud Public Launch