Overview
February 13, 2025
We’re thrilled to announce DeepSeek-R1, a cutting-edge open-source model that rivals OpenAI’s o1 and has taken the world by storm, now available on SambaNova Cloud! Due to high demand, access will be limited during the initial preview phase, but you can experience DeepSeek-R1’s at a blazing-fast inference speed on SambaNova Cloud today. This is just the beginning. Please stay tuned for even more exciting improvements coming soon!
-
DeepSeek-R1
- DeepSeek-R1, a 671B parameter MoE model, represents a significant advancement in AI. This open-source reasoning model demonstrates performance comparable to OpenAI’s o1 across tasks such as mathematics, coding, and reasoning. While DeepSeek-R1 was developed at a fraction of the cost typically associated with such frontier models, its inference remains not cost-effective, making broader capacity and availability a challenge. This is why access to DeepSeek-R1 has been limited—until now. SambaNova Cloud changes the game, delivering the fastest DeepSeek-R1 deployment in the world, making its powerful capabilities more accessible than ever.
For API access and higher rate limits for DeepSeek-R1, please complete this form to join the waitlist.
Please refer to the Supported models page for more details of the supported configurations and their model cards.
February 4, 2025
We are excited to announce the addition of Tülu 3 405B, an open-source model that performs better than even DeepSeek-V3, to SambaNova Cloud.
-
Llama-3.1-Tulu-3-405B
- Tülu 3 405B, developed by the Allen Institute for AI (AI2), is the first open-source alternative to DeepSeek-V3. Trained using Reinforcement Learning with Verifiable Rewards (RLVR), it demonstrates performance that is competitive with or superior to leading models like GPT-4o and DeepSeek-V3, with a notable advantage in safety benchmarks.
January 30, 2025
DeepSeek-R1-Distill-Llama-70B is now live on SambaNova Cloud. Experience cutting-edge AI that outshines top closed-source models in math, coding, and beyond—power up your workloads with unmatched performance today.
-
DeepSeek-R1-Distill-Llama-70B
- DeepSeek-R1-Distill models are fine-tuned based on open-source models released by DeepSeek AI, using samples generated by DeepSeek-R1. DeepSeek-R1-Distill-Llama-70B, built on Llama 3.3 70B, stands out for its exceptional performance, outperforms leading closed-source models—including GPT-4o, o1-mini, and Claude-3.5-Sonnet—across multiple benchmarks such as AIME, MATH-500, GPQA, and LiveCodeBench, demonstrating its superiority in both mathematical reasoning and coding tasks.
December 11, 2024
The latest Llama 3.3 70B model from Meta and the new leading open-source reasoning model QwQ from Alibaba’s Qwen team are now available on the SambaNova Cloud.
-
Llama 3.3 70B
- The latest Llama 3.3 70B model release from Meta showcases impressive capabilities across multiple domains, including reasoning, mathematical problem-solving, and general knowledge assessment. It delivers comparable performance to the Llama 3.1 405B. Benchmark comparisons suggest that it competes closely with leading proprietary models like OpenAI’s GPT-4o and Google’s Gemini Pro 1.5. This makes it yet another leading example of how open-source models are rapidly catching up to, and even surpassing, proprietary models.
-
QwQ 32B Preview
- The QwQ-32B-Preview model is an experimental AI model designed to enhance reasoning capabilities developed by Alibaba’s Qwen team. With 32.5 billion parameters, it excels in complex tasks such as mathematics and programming. Notably, it achieves scores of 65.2% on the Graduate-Level Google-Proof Q&A (GPQA), 50.0% on the American Invitational Mathematics Examination (AIME), 90.6% on MATH-500, and 50.0% on LiveCodeBench, indicating strong analytical proficiency. Given this is a preview release, it has limitations, including potential language mixing, recursive reasoning loops, and areas needing improvement like common sense reasoning and nuanced language understanding.
December 5, 2024
-
Qwen2.5 72B
- The Qwen2.5-72B model is a 72B-parameter language model that excels in coding, mathematics, and multilingual understanding. Trained on an extensive dataset of 18 trillion tokens, it supports context lengths up to 128,000 tokens and can generate outputs exceeding 8,000 tokens. The model offers robust instruction-following capabilities and supports over 29 languages, including English, Chinese, French, Spanish, and more.
-
Qwen2.5 Coder 32B
- The Qwen2.5-Coder-32B model is a 32B-parameter language model tailored for code-related tasks. It was trained on 5.5 trillion tokens, including source code and synthetic data. The model excels in code generation, reasoning, and debugging across 92 programming languages. Notably, it achieves a HumanEval score of 92.7%, matching coding capability of GPT-4o, making it one of the best open-source coding models for coding assistant like applications.
-
Llama Guard 3 8B
- Llama Guard 3-8B is a fine-tuned version of Meta’s Llama 3.1 model, specifically designed for content safety classification. It can be used to evaluate both inputs (prompts) and outputs (responses) of LLMs for content safety moderation. It functions as a LLM that generates output indicating whether a given prompt or response is deemed safe or unsafe. If deemed unsafe, it also identifies the specific content categories that are violated, aligning with the 14 MLCommons standardized hazards taxonomy.
This release also includes upgrades to the max sequence length for the following models:
-
Llama 3.2 1B model
- Max sequence length increased from 4k to 16k.
-
Llama 3.1 70B model
- Max sequence length increased from 64k to 128k.
-
Llama 3.1 405B model
- Max sequence length increased from 8k to 16k.
October 29, 2024
-
Llama 3.2 11B and 90B models
- Expanded Llama 3.2 models now include 11B and 90B versions, with multimodality support for text and image inputs, enabling more versatile AI applications and use cases.
-
Function calling
- The function calling API enables dynamic, agentic workflows by allowing the model to suggest and select function calls based on user input. This feature facilitates flexible agentic workflows that adapt to varied needs.
-
Multimodality in API and Playground
- Interact with multimodal models directly through the Inference API (OpenAI compatible) and Playground for seamless text and image processing.
-
Python and Gradio code samples for faster development
- New Python and Gradio code samples make it easier to build and deploy applications on SambaNova Cloud. These examples simplify integrating AI models, enabling faster prototyping and reducing setup time.
-
User experience improvements
-
AHow to Use APIguide provides a quick start with example curl code for both text and image inputs.
-
Streamlined access to updated code snippets for easier discoverability.
-
A newClear Chatoption making experimentation in the Playground even smoother.
-
New UI components to give a smoother user experience with added tool tips.
-
-
Updated AI Starter Kits
-
Multimodal retriever:Chart, Image, and Figure Understanding. Unlock insights from complex PDFs and images with advanced retrieval and answer generation that combines both visual and textual data.
-
Llama 3.1 Instruct-o1:Enhanced Reasoning with Llama 3.1 405B. Experience advanced thinking capabilities with Llama 3.1 Instruct-o1, hosted on Hugging Face Spaces.
-
October 10, 2024
-
Llama 3.1 8B model
- Max sequence length increased from 8k to 16k.
-
Llama 3.1 70B model
- Max sequence length increased from 8k to 64k.
-
Automatic Routing Based on Sequence Length
- You no longer need to change the model name to specify different sequence lengths. The system will automatically route requests based on sequence length. For example, there is no need to use
Meta-Llama-3.1-70B-Instruct-8k
for the 8k sequence length anymore. While we still support the existing method for backward compatibility, we recommend switching to the new method for the best experience.
- You no longer need to change the model name to specify different sequence lengths. The system will automatically route requests based on sequence length. For example, there is no need to use
-
Improved performance for Llama 3.2 1B and 3B models.
October 1st, 2024
-
ReleasedLlama 3.2 1B and 3B models.
-
Available to all tiers at thefastest inference speed.
September 10th, 2024
-
Public launch of theSambaNova Cloud portal, APIand thecommunity.
-
Access toLlama 3.1 8B, 70B and 405Batfull precision and 10x fasterinference compared to GPUs.
-
Launched with two tiers -free and enterprise (paid).