Overview - SambaNova Documentation

May 28, 2025

SambaCloud is now available as a SaaS offering on AWS Marketplace. Users can subscribe using their AWS account and connect securely via AWS PrivateLink, enabling fast, private, and scalable access to top open-source models like Llama 4, DeepSeek, and Whisper. Powered by SambaNova’s Reconfigurable Dataflow Unit (RDU), this integration delivers up to 10x faster inference than GPUs—ideal for real-time AI applications. Key features

Available via AWS Marketplace: Subscribe using your AWS account and streamline billing through AWS.
PrivateLink support: Secure, low-latency connections between your AWS VPC and SambaCloud—no public internet exposure.
Fast, seamless onboarding: Begin inference in minutes with models like Llama 4, DeepSeek-R1 671B, and Whisper.

Benefits

High performance: Up to 10x faster inference vs. GPUs with SambaNova’s RDU architecture.
Privacy-first architecture: SambaNova never stores your data—you retain full control.
Support for fine-tuned models: Deploy your custom models without code changes.

See AWS Marketplace integration document for more information.

May 6, 2025

We’re excited to announce that DeepSeek-V3-0324 now supports function calling! This enhancement enables more dynamic and programmable interactions by allowing you to invoke external functions directly through the model’s outputs. See Function calling section for more information.

April 29, 2025

We’ve added Qwen3-32B, a high-capacity, multilingual LLM to the SambaCloud platform as a Preview model. Qwen3-32B is part of the Qwen3 series and offers strong performance across a wide range of general-purpose language tasks, including question answering, summarization, reasoning, and coding.

Qwen3-32B
- Available to all users via the Playground and API.
- See the Supported models and Rate limits pages for more information.

April 18, 2025

We’re excited to announce the addition of Whisper-Large-V3, OpenAI’s latest large-scale automatic speech recognition (ASR) model. This model offers enhanced transcription accuracy, improved multilingual support, and better robustness against noisy audio environments.

Whisper-Large-V3
- The model is available to anyone for use in the API.
- See the Supported models and Rate limits pages for more information.

April 16, 2025

Llama-4-Maverick-17B-128E-Instruct now supports image input functionality. You can provide up to two images as context alongside text prompts. This enhanced model is available for use via both the Playground and API, and is accessible to all users.

April 14, 2025

As part of our ongoing efforts to streamline and enhance the SambaCloud model offerings, the following models have been deprecated:

Llama-3.1-Swallow-70B-Instruct-v0.3
Llama-3.1-Tulu-3-405B
Llama-3.2-11B-Vision-Instruct
Llama-3.2-90B-Vision-Instruct
Meta-Llama-3.1-70B-Instruct
Qwen2.5-72B-Instruct
Qwen2.5-Coder-32B-Instruct

These models will no longer receive updates and are scheduled for removal from active endpoints after April 14th, 2025. For more information and guidance on alternatives, please visit our Model Deprecations page.

April 9, 2025

We are excited to announce that another model of the Llama 4 family, Maverick, has been added to SambaCloud. It is a 400-billion-parameter mixture-of-experts model with 17 billion active parameters and 128 experts, delivering results competitive across a variety of benchmarks with Gemma 3, Gemini 2.0 Flash, and Mistral 3.1.

Llama-4-Maverick-17B-128E-Instruct
- Added Llama-4-Maverick-17B-128E-Instruct as a Preview model with limited capacity, currently supporting text input only. Support for image input will shortly follow.
- The model is available to anyone for use in the Playground and API.
- See the Supported models and Rate limits pages for more information.

April 7, 2025

The next generation of Llama models have arrived and now Llama 4 Scout is readily available on SambaCloud. It is a 109B parameter mixture-of-experts model with 17B active parameters and 16 experts, and can deliver results competitive across a variety of benchmarks with Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1.

Llama-4-Scout-17B-16E-Instruct
- Added Llama-4-Scout-17B-16E-Instruct as a Preview model with limited capacity, currently supporting text input only. Support for image input will shortly follow.
- The model is available to anyone for use in the Playground and API.
- See the Supported models and Rate limits pages for more information.

March 27, 2025

We’re excited to announce the addition of DeepSeek V3-0324, the first open source non-reasoning model that outperforms proprietary non-reasoning models! Along with its major boost in reasoning performance, DeepSeek V3-0324 also provides stronger front-end development skills and smarter tool-use capabilities.

DeepSeek-V3-0324
- Added DeepSeek-V3-0324 as a Preview model with limited capacity. We are actively working to provide it as a Production model.
- The model is available to anyone for use in the Playground and API.
- See the Supported models and Rate limits pages for more information.

March 20, 2025

E5-Mistral-7B-Instruct
- Added the E5-Mistral-7B-Instruct model. E5-Mistral-7B-Instruct is an embedding model with a Mistral architecture backbone. This backbone model has multilingual capabilities, however, it is recommended for E5-Mistral-7B-Instruct to use it only for English as it does not perform as well on multilingual benchmarks.
- The model is available to anyone for use in the API.
- See the Embeddings capabilities and endpoint documentation for more information.

March 18, 2025

DeepSeek-R1
- Deepseek-R1 has transitioned from a preview model and is available for use in the Playground and API as a production model.
- As a production model, DeepSeek-R1 is available to anyone with an increased context length to 16k tokens.
Model list endpoint
- Added the model list endpoint that provides information about the currently available models in SambaCloud.

March 13, 2025

Updated the model DeepSeek-R1-Distill-Llama-70B
- The context length of DeepSeek-R1-Distill-Llama-70B has increased to 128k tokens, and is available for use in both Playground and API.

March 6th, 2025

Added the model QwQ-32B

QwQ-32B is a state-of-the-art reasoning model released by the Alibaba Qwen team. Despite having far fewer parameters than DeepSeek-R1’s 671B (with 37B activated), QwQ-32B delivers comparable performance. Beyond its exceptional capabilities in language understanding and creative reasoning, QwQ-32B now integrates advanced agent-related features, enabling it to think critically, leverage external tools, and adapt its reasoning based on dynamic environmental feedback.

March 5th, 2025

Added the model Llama-3.1-Swallow-8B-Instruct-v0.3
Added the model Llama-3.1-Swallow-70B-Instruct-v0.3

The Llama 3.1 Swallow series comprises Japanese-optimized language models developed through continual pre-training of Meta’s Llama 3.1 architecture. These 8B and 70B parameter versions enhance Japanese linguistic capabilities while preserving English proficiency through training on 200B tokens from diverse sources including web corpora, technical content, and multilingual Wikipedia articles. The instruction-tuned variants, such as the v0.3, use synthetic Japanese data for fine-tuning, achieving state-of-the-art performance on Japanese benchmarks like MT-Bench, with the 8B Instruct v0.3 outperforming its predecessor by 8.4 points and 70B version 5.68 points. Please refer to the Supported models page for more details of the supported configurations and their model cards.

February 25, 2025

Llama-3.3-70B
- The context length of Llama-3.3-70B has increased to 128k tokens, and is available for use in both playground and API as a production model.

February 21, 2025

DeepSeek-R1
- The context length of DeepSeek-R1 has increased to 8k tokens, and is available for use in the Playground as a preview model.

February 13, 2025

We’re thrilled to announce DeepSeek-R1, a cutting-edge open-source model that rivals OpenAI’s o1 and has taken the world by storm, now available on SambaCloud! Due to high demand, access will be limited during the initial preview phase, but you can experience DeepSeek-R1’s at a blazing-fast inference speed on SambaCloud today. This is just the beginning. Please stay tuned for even more exciting improvements coming soon!

DeepSeek-R1
- DeepSeek-R1, a 671B parameter MoE model, represents a significant advancement in AI. This open-source reasoning model demonstrates performance comparable to OpenAI’s o1 across tasks such as mathematics, coding, and reasoning. While DeepSeek-R1 was developed at a fraction of the cost typically associated with such frontier models, its inference remains not cost-effective, making broader capacity and availability a challenge. This is why access to DeepSeek-R1 has been limited—until now. SambaCloud changes the game, delivering the fastest DeepSeek-R1 deployment in the world, making its powerful capabilities more accessible than ever.

For API access and higher rate limits for DeepSeek-R1, please complete this form to join the waitlist.

Please refer to the Supported models page for more details of the supported configurations and their model cards.

February 4, 2025

We are excited to announce the addition of Tülu 3 405B, an open-source model that performs better than even DeepSeek-V3, to SambaCloud.

Llama-3.1-Tulu-3-405B
- Tülu 3 405B, developed by the Allen Institute for AI (AI2), is the first open-source alternative to DeepSeek-V3. Trained using Reinforcement Learning with Verifiable Rewards (RLVR), it demonstrates performance that is competitive with or superior to leading models like GPT-4o and DeepSeek-V3, with a notable advantage in safety benchmarks.

January 30, 2025

DeepSeek-R1-Distill-Llama-70B is now live on SambaCloud. Experience cutting-edge AI that outshines top closed-source models in math, coding, and beyond—power up your workloads with unmatched performance today.

DeepSeek-R1-Distill-Llama-70B
- DeepSeek-R1-Distill models are fine-tuned based on open-source models released by DeepSeek AI, using samples generated by DeepSeek-R1. DeepSeek-R1-Distill-Llama-70B, built on Llama 3.3 70B, stands out for its exceptional performance, outperforms leading closed-source models—including GPT-4o, o1-mini, and Claude-3.5-Sonnet—across multiple benchmarks such as AIME, MATH-500, GPQA, and LiveCodeBench, demonstrating its superiority in both mathematical reasoning and coding tasks.

December 11, 2024

The latest Llama 3.3 70B model from Meta and the new leading open-source reasoning model QwQ from Alibaba’s Qwen team are now available on the SambaCloud.

Llama 3.3 70B
- The latest Llama 3.3 70B model release from Meta showcases impressive capabilities across multiple domains, including reasoning, mathematical problem-solving, and general knowledge assessment. It delivers comparable performance to the Llama 3.1 405B. Benchmark comparisons suggest that it competes closely with leading proprietary models like OpenAI’s GPT-4o and Google’s Gemini Pro 1.5. This makes it yet another leading example of how open-source models are rapidly catching up to, and even surpassing, proprietary models.
QwQ 32B Preview
- The QwQ-32B-Preview model is an experimental AI model designed to enhance reasoning capabilities developed by Alibaba’s Qwen team. With 32.5 billion parameters, it excels in complex tasks such as mathematics and programming. Notably, it achieves scores of 65.2% on the Graduate-Level Google-Proof Q&A (GPQA), 50.0% on the American Invitational Mathematics Examination (AIME), 90.6% on MATH-500, and 50.0% on LiveCodeBench, indicating strong analytical proficiency. Given this is a preview release, it has limitations, including potential language mixing, recursive reasoning loops, and areas needing improvement like common sense reasoning and nuanced language understanding.

December 5, 2024

Qwen2.5 72B
- The Qwen2.5-72B model is a 72B-parameter language model that excels in coding, mathematics, and multilingual understanding. Trained on an extensive dataset of 18 trillion tokens, it supports context lengths up to 128,000 tokens and can generate outputs exceeding 8,000 tokens. The model offers robust instruction-following capabilities and supports over 29 languages, including English, Chinese, French, Spanish, and more.
Qwen2.5 Coder 32B
- The Qwen2.5-Coder-32B model is a 32B-parameter language model tailored for code-related tasks. It was trained on 5.5 trillion tokens, including source code and synthetic data. The model excels in code generation, reasoning, and debugging across 92 programming languages. Notably, it achieves a HumanEval score of 92.7%, matching coding capability of GPT-4o, making it one of the best open-source coding models for coding assistant like applications.
Llama Guard 3 8B
- Llama Guard 3-8B is a fine-tuned version of Meta’s Llama 3.1 model, specifically designed for content safety classification. It can be used to evaluate both inputs (prompts) and outputs (responses) of LLMs for content safety moderation. It functions as a LLM that generates output indicating whether a given prompt or response is deemed safe or unsafe. If deemed unsafe, it also identifies the specific content categories that are violated, aligning with the 14 MLCommons standardized hazards taxonomy.

This release also includes upgrades to the max sequence length for the following models:

Llama 3.2 1B model
- Max sequence length increased from 4k to 16k.
Llama 3.1 70B model
- Max sequence length increased from 64k to 128k.
Llama 3.1 405B model
- Max sequence length increased from 8k to 16k.

October 29, 2024

Llama 3.2 11B and 90B models
- Expanded Llama 3.2 models now include 11B and 90B versions, with multimodality support for text and image inputs, enabling more versatile AI applications and use cases.
Function calling
- The function calling API enables dynamic, agentic workflows by allowing the model to suggest and select function calls based on user input. This feature facilitates flexible agentic workflows that adapt to varied needs.
Multimodality in API and Playground
- Interact with multimodal models directly through the Inference API (OpenAI compatible) and Playground for seamless text and image processing.
Python and Gradio code samples for faster development
- New Python and Gradio code samples make it easier to build and deploy applications on SambaCloud. These examples simplify integrating AI models, enabling faster prototyping and reducing setup time.
User experience improvements
- AHow to Use APIguide provides a quick start with example curl code for both text and image inputs.
- Streamlined access to updated code snippets for easier discoverability.
- A newClear Chatoption making experimentation in the Playground even smoother.
- New UI components to give a smoother user experience with added tool tips.
Updated AI Starter Kits
- Multimodal retriever
  - Chart, Image, and Figure Understanding. Unlock insights from complex PDFs and images with advanced retrieval and answer generation that combines both visual and textual data.
- Llama-3.1-Instruct-o1
  - Enhanced Reasoning with Llama-3.1-405B. Experience advanced thinking capabilities with Llama-3.1-Instruct-o1, hosted on Hugging Face Spaces.

October 10, 2024

Llama 3.1 8B model
- Max sequence length increased from 8k to 16k.
Llama 3.1 70B model
- Max sequence length increased from 8k to 64k.
- Automatic routing based on sequence length
  - You no longer need to change the model name to specify different sequence lengths. The system will automatically route requests based on sequence length. For example, there is no need to useMeta-Llama-3.1-70B-Instruct-8kfor the 8k sequence length anymore. While we still support the existing method for backward compatibility, we recommend switching to the new method for the best experience.
Improved performance for Llama 3.2 1B and 3B models.

October 1st, 2024

Released Llama 3.2 1B and 3B models.
Available to all tiers at the fastest inference speed.

September 10th, 2024

Public launch of the SambaCloud portal, API and the community.
Access to Llama 3.1 8B, 70B, and 405B at full precision and 10x faster inference compared to GPUs.
Launched with two tiers - free and enterprise (paid).

Release notes

​May 28, 2025

​May 6, 2025

​April 29, 2025

​April 18, 2025

​April 16, 2025

​April 14, 2025

​April 9, 2025

​April 7, 2025

​March 27, 2025

​March 20, 2025

​March 18, 2025

​March 13, 2025

​March 6th, 2025

​March 5th, 2025

​February 25, 2025

​February 21, 2025

​February 13, 2025

​February 4, 2025

​January 30, 2025

​December 11, 2024

​December 5, 2024

​October 29, 2024

​October 10, 2024

​October 1st, 2024

​September 10th, 2024

May 28, 2025

May 6, 2025

April 29, 2025

April 18, 2025

April 16, 2025

April 14, 2025

April 9, 2025

April 7, 2025

March 27, 2025

March 20, 2025

March 18, 2025

March 13, 2025

March 6th, 2025

March 5th, 2025

February 25, 2025

February 21, 2025

February 13, 2025

February 4, 2025

January 30, 2025

December 11, 2024

December 5, 2024

October 29, 2024

October 10, 2024

October 1st, 2024

September 10th, 2024