October 2025
Release Date: October 7, 2025New Features and Enhancements
API Reference Documentation Upgrade
Upgraded the API Reference documentation with a fully automated, Swagger-style interface that dynamically syncs with the OpenAPI specification.- Interactive, auto-generated API documentation with full model definitions.
- Real-time updates to stay aligned with the latest OpenAPI spec.
- Up-to-date usage examples in Python and TypeScript, powered by the official SambaNova SDKs.
June 2025
Release Date: June 17, 2025New Features and Enhancements
Japanese Language Support
Added Japanese language support for SambaNova documentation.- Go to the SambaNova documentation homepage.
- Open the language dropdown menu and select Japanese.
May 2025
Release Date: May 28, 2025New Features and Enhancements
AWS Marketplace Integration
SambaCloud is now available as a SaaS offering on AWS Marketplace.- Subscribe using your AWS account and streamline billing through AWS.
- PrivateLink support for secure, low-latency connections between your AWS VPC and SambaCloud—no public internet exposure.
- Fast onboarding with models like Llama 4, DeepSeek-R1 671B, and Whisper.
- Up to 10x faster inference vs. GPUs with SambaNova’s RDU architecture.
- Privacy-first architecture: SambaNova never stores your data.
- Support for fine-tuned model deployment without code changes.
DeepSeek-V3-0324 Function Calling Support
Release Date: May 6, 2025 Added function calling support for DeepSeek-V3-0324, enabling more dynamic and programmable interactions by allowing you to invoke external functions directly through the model’s outputs. See Function calling for more information.April 2025
New Features and Enhancements
New Models
-
Qwen3-32B (April 29, 2025)
- Added as a Preview model. High-capacity, multilingual LLM with strong performance across question answering, summarization, reasoning, and coding.
- Available via Playground and API.
-
Whisper-Large-V3 (April 18, 2025)
- OpenAI’s latest large-scale automatic speech recognition (ASR) model with enhanced transcription accuracy, improved multilingual support, and better robustness against noisy audio.
- Available via API.
-
Llama-4-Maverick-17B-128E-Instruct (April 9, 2025)
- 400B parameter mixture-of-experts model with 17B active parameters and 128 experts.
- Added as a Preview model. Competitive with Gemma 3, Gemini 2.0 Flash, and Mistral 3.1.
- Available via Playground and API.
-
Llama-4-Scout-17B-16E-Instruct (April 7, 2025)
- 109B parameter mixture-of-experts model with 17B active parameters and 16 experts.
- Added as a Preview model. Competitive with Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1.
- Available via Playground and API.
Llama-4-Maverick Image Input Support
Release Date: April 16, 2025 Llama-4-Maverick-17B-128E-Instruct now supports image input functionality with up to two images as context alongside text prompts. Available via Playground and API.Deprecations
Release Date: April 14, 2025 The following models have been deprecated and are scheduled for removal from active endpoints:- Llama-3.1-Swallow-70B-Instruct-v0.3
- Llama-3.1-Tulu-3-405B
- Llama-3.2-11B-Vision-Instruct
- Llama-3.2-90B-Vision-Instruct
- Meta-Llama-3.1-70B-Instruct
- Qwen2.5-72B-Instruct
- Qwen2.5-Coder-32B-Instruct
March 2025
New Features and Enhancements
New Models
-
DeepSeek-V3-0324 (March 27, 2025)
- First open-source non-reasoning model that outperforms proprietary non-reasoning models.
- Major boost in reasoning performance, stronger front-end development skills, and smarter tool-use capabilities.
- Added as a Preview model. Available via Playground and API.
-
E5-Mistral-7B-Instruct (March 20, 2025)
- Embedding model with Mistral architecture backbone.
- Recommended for English use. Available via API.
- See Embeddings capabilities and endpoint documentation.
-
QwQ-32B (March 6, 2025)
- State-of-the-art reasoning model from Alibaba Qwen team.
- Delivers comparable performance to DeepSeek-R1’s 671B with far fewer parameters.
- Integrates advanced agent-related features for critical thinking and external tool usage.
-
Llama-3.1-Swallow-8B-Instruct-v0.3 and Llama-3.1-Swallow-70B-Instruct-v0.3 (March 5, 2025)
- Japanese-optimized language models developed through continual pre-training of Meta’s Llama 3.1.
- Trained on 200B tokens from diverse sources including web corpora, technical content, and multilingual Wikipedia.
DeepSeek-R1 Production Release
Release Date: March 18, 2025 DeepSeek-R1 transitioned from preview to production model with increased context length to 16k tokens. Available via Playground and API.API Improvements and Fixes
Model List Endpoint
Release Date: March 18, 2025 Added the model list endpoint that provides information about currently available models in SambaCloud.Context Length Increases
- DeepSeek-R1-Distill-Llama-70B (March 13, 2025): Increased to 128k tokens.
February 2025
New Features and Enhancements
New Models
-
DeepSeek-R1 (February 13, 2025)
- 671B parameter MoE model with performance comparable to OpenAI’s o1 across mathematics, coding, and reasoning.
- Added as a Preview model. SambaCloud delivers the fastest DeepSeek-R1 deployment in the world.
-
Llama-3.1-Tulu-3-405B (February 4, 2025)
- Open-source model from Allen Institute for AI (AI2) that performs better than DeepSeek-V3.
- Trained using Reinforcement Learning with Verifiable Rewards (RLVR).
- Competitive with or superior to GPT-4o and DeepSeek-V3, with notable advantage in safety benchmarks.
-
DeepSeek-R1-Distill-Llama-70B (January 30, 2025)
- Fine-tuned on Llama 3.3 70B using samples generated by DeepSeek-R1.
- Outperforms GPT-4o, o1-mini, and Claude-3.5-Sonnet across AIME, MATH-500, GPQA, and LiveCodeBench.
Context Length Increases
- Llama-3.3-70B (February 25, 2025): Increased to 128k tokens. Available as production model.
- DeepSeek-R1 (February 21, 2025): Increased to 8k tokens. Available as preview model.
For API access and higher rate limits for DeepSeek-R1, please complete this form to join the waitlist.
December 2024
New Features and Enhancements
New Models
-
Llama 3.3 70B (December 11, 2024)
- Delivers comparable performance to Llama 3.1 405B.
- Competes closely with OpenAI’s GPT-4o and Google’s Gemini Pro 1.5.
-
QwQ 32B Preview (December 11, 2024)
- Experimental reasoning model from Alibaba’s Qwen team with 32.5B parameters.
- Excels in mathematics and programming: 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, 50.0% on LiveCodeBench.
-
Qwen2.5 72B (December 5, 2024)
- 72B-parameter model trained on 18T tokens. Supports context lengths up to 128k tokens.
- Supports 29+ languages including English, Chinese, French, and Spanish.
-
Qwen2.5 Coder 32B (December 5, 2024)
- 32B-parameter model tailored for code-related tasks across 92 programming languages.
- HumanEval score of 92.7%, matching GPT-4o coding capability.
-
Llama Guard 3 8B (December 5, 2024)
- Fine-tuned for content safety classification, aligning with 14 MLCommons standardized hazards taxonomy.
Context Length Increases
- Llama 3.2 1B: Increased from 4k to 16k.
- Llama 3.1 70B: Increased from 64k to 128k.
- Llama 3.1 405B: Increased from 8k to 16k.
October 2024
New Features and Enhancements
New Models
-
Llama 3.2 11B and 90B (October 29, 2024)
- Multimodality support for text and image inputs.
-
Llama 3.2 1B and 3B (October 1, 2024)
- Available to all tiers at the fastest inference speed.
Function Calling
Release Date: October 29, 2024 Added function calling API enabling dynamic, agentic workflows by allowing the model to suggest and select function calls based on user input.Multimodality in API and Playground
Release Date: October 29, 2024 Interact with multimodal models directly through the Inference API (OpenAI compatible) and Playground for seamless text and image processing.Python and Gradio Code Samples
Release Date: October 29, 2024 New code samples for faster prototyping and reduced setup time.API Improvements and Fixes
Automatic Sequence Length Routing
Release Date: October 10, 2024 Automatic routing based on sequence length—no need to change model names to specify different sequence lengths.Context Length Increases
- Llama 3.1 8B (October 10, 2024): Increased from 8k to 16k.
- Llama 3.1 70B (October 10, 2024): Increased from 8k to 64k.
Performance Improvements
Improved performance for Llama 3.2 1B and 3B models.User Experience Improvements
- How to Use API guide with example curl code for text and image inputs.
- Streamlined access to updated code snippets.
- New Clear Chat option in Playground.
- New UI components with tool tips.
Updated AI Starter Kits
- Multimodal Retriever: Chart, image, and figure understanding with advanced retrieval combining visual and textual data.
- Llama-3.1-Instruct-o1: Enhanced reasoning with Llama-3.1-405B, hosted on Hugging Face Spaces.
September 2024
Release Date: September 10, 2024New Features and Enhancements
SambaCloud Public Launch
- Public launch of the SambaCloud portal, API and the community.
- Access to Llama 3.1 8B, 70B, and 405B at full precision and 10x faster inference compared to GPUs.
- Launched with two tiers: free and enterprise (paid).
