Samba-1 Turbo 25.3.1-MP2

Release version: 25.3.1-MP2 | Release date: 04/21/2025

The Samba-1 Turbo 25.3.1-MP2 (Model Pack 2) release includes new and updated model versions for existing models, and quality improvements across performance and inference efficiency.

New and updated model versions

New Models

Meta-Llama-3-70B
Meta-Llama-3-8B
Meta-Llama-3.2-1B-HF
Meta-Llama-3.2-1B-Instruct-HF
Meta-Llama-3.2-3B-HF
Meta-Llama-3.2-3B-Instruct-HF
Mistral-7B-v0.3
Mistral-7B-Instruct-v0.3

Updated Models

This release includes updated versions of the following models:

DeepSeek-V3
DeepSeek-R1
DeepSeek-R1-Distill-Llama-70B
Meta-Llama-3.1-70B-Instruct
Meta-Llama-3.2-1B-Instruct

Quality improvements

The following updates improve model performance, inference efficiency, and overall system throughput across multiple models and components.

DeepSeek-V3
- Extended context length support to 8K and 16K tokens
- Added support for multi-token prediction in IE for improved performance
- Introduced faster PEF for enhanced inference speed
DeepSeek-R1
- Extended context length support to 8K and 16K tokens
- Added support for multi-token prediction in IE
- Introduced faster PEF
DeepSeek-R1-Distill-Llama-70B: Increased batch size to 32 for improved throughput
Meta-Llama-3.1-70B-Instruct: Increased batch size to 32 for enhanced decoding performance (Spec Decoding PEF)
Meta-Llama-3.2-1B-Instruct: Now supports both TP8 and TP16 configurations (replaces deprecated TP16 variant)
Tokenizer performance in IE: Improved performance during detokenization, enabling better end-to-end throughput

Samba-1 Turbo 25.3.1-MP2 model information

The table below describes the new and updated model options in this release. Click the triangle below to expand and view the model options.

Click to view/hide the model options.

Model name	Release status	Description	Attributes	Usage notes
DeepSeek-R1	Existing	DeepSeek-R1 is a high-performance Mixture-of-Experts (MoE) language model with 671B total parameters (37B active per token), trained on 14.8T tokens. It combines Multi-head Latent Attention (MLA) and DeepSeekMoE architectures for efficient, stable training and inference. DeepSeek-R1 leverages supervised fine-tuning and reinforcement learning and delivers performance comparable to top closed-source models across language understanding and generation tasks.	Mathematics Coding Reasoning	Supported on SN40L-16 hardware and capable of running online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
DeepSeek-R1-Distill-Llama-70B	Existing	DeepSeek-R1-Distill-Llama-70B is a compact, finetuned language model distilled from DeepSeek-R1, based on the Llama-70B architecture. It delivers strong performance on math, code, and reasoning tasks while improving readability and reducing repetition. As part of DeepSeek’s first-generation open-source models, it offers efficient, high-quality reasoning in a dense format.	Mathematics Coding Reasoning	Supported on SN40L-16 hardware and capable of running online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a target model when creating a speculative decoding pair. Does not support function calling.
DeepSeek-V3	Existing	DeepSeek-V3 is a large Mixture-of-Experts (MoE) language model with 671B total parameters and 37B active per token. It uses advanced techniques like Multi-head Latent Attention (MLA) and an improved MoE setup for faster, more efficient training and inference. Trained on 14.8 trillion tokens and fine-tuned with both supervised learning and reinforcement learning, DeepSeek-V3 shows strong performance across tasks—comparable to top closed-source models—while staying stable and efficient throughout training.	Mathematics Coding Reasoning	Supported on SN40L-16 hardware and capable of running online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Meta-Llama-3-70B	New	Meta’s Llama 3 series includes powerful language models with 8B and 70B parameters. The 70B model is built for generating natural language and was trained on over 15 trillion tokens of public online data. It uses a transformer-based architecture and works best for English.	Language understanding Knowledge recall Reasoning	Supported on SN40L-8 and SN40L-16 hardware generations, with support for both training and online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Meta-Llama-3-8B	New	Meta-Llama-3-8B model is a powerful base language model designed for general-purpose text generation in English. It was trained on over 15 trillion tokens from publicly available data sources and uses an optimized transformer architecture for efficient performance.	Language understanding Knowledge recall Coding	Supported on SN40L-8 and SN40L-16 hardware generations, with support for both training and online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Meta-Llama-3.1-70B-Instruct	Existing	Meta Llama 3.1-70B Instruct is a large, multilingual AI model designed for high-quality chat and conversation in many languages. It performs better than many other open or closed models on standard benchmarks. Trained on publicly available internet data and built with an optimized transformer architecture, it’s well-suited for chat assistants and other natural language generation tasks.	Multilingual understanding and generation Chat applications Natural language generation	Supported on SN40L-8 and SN40L-16 hardware generations, and capable of running online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Model supports function calling.
Meta-Llama-3.2-1B-HF	New	Meta Llama 3.2 is a collection of multilingual language models available in two sizes: 1B and 3B parameters. These models are designed to take in text and generate text in return. They’re built to handle tasks like chatting in multiple languages, summarizing content, and retrieving useful information—making them great for assistant-like applications. The Llama 3.2 models have been trained on large amounts of public online data using a powerful transformer architecture.	Multilingual understanding and generation Chat applications	Supported on SN30 hardware with support for both training and online inference jobs. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Meta-Llama-3.2-1B-Instruct	Existing	The Llama 3.2 series offers a selection of multilingual large language models (LLMs), available in 1B and 3B parameter sizes, designed for text-based input and output. The Instruct variants are instruction-tuned models that are specifically optimized for multilingual conversational applications, excelling in tasks like agent-driven retrieval and summarization. They surpass many other open-source and proprietary chat models on widely recognized industry benchmark.	Multilingual General purpose Document analysis	Supported on SN40L-8 and SN40L-16 hardware generations, and capable of running online inference jobs. Does not support function calling.
Meta-Llama-3.2-1B-Instruct-HF	New	Meta-Llama-3.2-1B-Instruct-HF is a small but capable language model designed to handle tasks like chatting, summarizing information, agentic retrieval, and working with multiple languages. It’s been fine-tuned to follow instructions better, making it more useful for practical applications like building smart assistants or language tools. Even though it’s just 1 billion parameters, it performs impressively well compared to many bigger models out there. It’s lightweight, multilingual, and great for both research and commercial use.	Multilingual understanding Instruction following Text generation and summarization	Supported on SN30 hardware with support for both training and online inference jobs. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Meta-Llama-3.2-3B-HF	New	Meta-Llama-3.2-3B-HF is a language model designed to understand and generate text in multiple languages. It’s part of the Llama 3.2 family and is great for tasks like chatting, summarizing, and helping retrieve information.	Text generation and summarization Multilingual support	Supported on SN30 hardware with support for both training and online inference jobs. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Meta-Llama-3.2-3B-Instruct-HF	New	Meta-Llama-3.2-3B-Instruct-HF is a fine-tuned language model designed for tasks like multilingual conversation, summarization, and retrieving information. It is built to perform well across many open and closed-source benchmarks. The model is auto-regressive, uses a smart transformer setup, and has been trained on publicly available online data. It’s great for building chat assistants and other natural language tools that work across multiple languages.	Multilingual conversation Text generation and summarization	Supported on SN30 hardware with support for both training and online inference jobs. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Mistral-7B-Instruct-v0.2	Existing	Mistral-7B-Instruct-v0.2 is an instruction fine-tuned version of the Mistral-7B-v0.2 language model, tailored for tasks requiring precise instruction-following capabilities. This model is particularly well-suited for a variety of applications, including content generation, text analysis, and problem-solving. It excels in creating coherent and contextually relevant text, making it ideal for tasks like report writing, code generation, and answering questions. The enhancements in this version enable it to handle more sophisticated tasks with higher accuracy and efficiency.	Instruction following Text generation and summarization General purpose tasks	Supported on SN40L-8, SN40L-16, and SN30 hardware generations, with support for both training and online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Mistral-7B-Instruct-v0.3	New	Mistral-7B-Instruct-v0.3 is a fine-tuned version of the Mistral-7B-v0.3 model, built to better follow instructions. Compared to the earlier v0.2 version, it brings a few important upgrades: it now has an expanded vocabulary of 32,768 tokens, uses a new v3 tokenizer, and adds support for function calling. This makes it even better for tasks like answering questions, generating text, and handling more structured outputs.	Instruction following Text generation and summarization General purpose tasks	Supported on SN30 hardware with support for both training and online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Mistral-7B-v0.1	Existing	Mistral-7B-v0.1 is a powerful text generation model with 7 billion parameters. It’s a base model, therefore it’s pre-trained but not fine-tuned for specific tasks. It supports long input lengths, up to 32,000 tokens. In SambaStudio, it can run with different setups depending on how long the input is — 8k tokens with one RDU or 32k tokens with up to eight RDUs.	Text generation Long context handling	Supported on SN40L-8, SN40L-16, and SN30 hardware generations, with support for training, batch inference, and online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Mistral-7B-v0.3	New	Mistral-7B-v0.3 is a pre-trained large language model, built for text generation but not fine-tuned for following instructions. Compared to the earlier v0.2 version, it has a bigger vocabulary (32,768 tokens),supports the new v3 tokenizer, and has function calling capabilities. Since it’s a base model, it doesn’t respond like a chatbot or follow instructions directly. It also doesn’t have built-in safety features like moderation.	Text generation Long context handling	Supported on SN30 hardware with support for both training and online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.
Zephyr-7B-Beta	Existing	Zephyr-7B-Beta is a 7 billion parameter language model, fine-tuned using Direct Preference Optimization (DPO) on a mix of public and synthetic datasets. Built on top of the Mistral-7B-v0.1 model, Zephyr-7B-Beta is designed to be more helpful and user-friendly. This model can handle long conversations with up to 32k token context and is flexible to deploy — from 8k sequence lengths on a single RDU to 32k using up to 8 RDUs.	Text generation and summarization Instruction following Long context handling Chat	Supported on SN40L-8, SN40L-16, and SN30 hardware generations, with support for training, batch inference, and online inference jobs. Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. Can be used as a draft model when creating a speculative decoding pair. Does not support function calling.

Model name

Release status

Description

Attributes

Usage notes

DeepSeek-R1

Existing

DeepSeek-R1 is a high-performance Mixture-of-Experts (MoE) language model with 671B total parameters (37B active per token), trained on 14.8T tokens. It combines Multi-head Latent Attention (MLA) and DeepSeekMoE architectures for efficient, stable training and inference. DeepSeek-R1 leverages supervised fine-tuning and reinforcement learning and delivers performance comparable to top closed-source models across language understanding and generation tasks.

Mathematics
Coding
Reasoning

Supported on SN40L-16 hardware and capable of running online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

DeepSeek-R1-Distill-Llama-70B

Existing

DeepSeek-R1-Distill-Llama-70B is a compact, finetuned language model distilled from DeepSeek-R1, based on the Llama-70B architecture. It delivers strong performance on math, code, and reasoning tasks while improving readability and reducing repetition. As part of DeepSeek’s first-generation open-source models, it offers efficient, high-quality reasoning in a dense format.

Mathematics
Coding
Reasoning

Supported on SN40L-16 hardware and capable of running online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a target model when creating a speculative decoding pair.
Does not support function calling.

DeepSeek-V3

Existing

DeepSeek-V3 is a large Mixture-of-Experts (MoE) language model with 671B total parameters and 37B active per token. It uses advanced techniques like Multi-head Latent Attention (MLA) and an improved MoE setup for faster, more efficient training and inference. Trained on 14.8 trillion tokens and fine-tuned with both supervised learning and reinforcement learning, DeepSeek-V3 shows strong performance across tasks—comparable to top closed-source models—while staying stable and efficient throughout training.

Mathematics
Coding
Reasoning

Supported on SN40L-16 hardware and capable of running online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Meta-Llama-3-70B

New

Meta’s Llama 3 series includes powerful language models with 8B and 70B parameters. The 70B model is built for generating natural language and was trained on over 15 trillion tokens of public online data. It uses a transformer-based architecture and works best for English.

Language understanding
Knowledge recall
Reasoning

Supported on SN40L-8 and SN40L-16 hardware generations, with support for both training and online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Meta-Llama-3-8B

New

Meta-Llama-3-8B model is a powerful base language model designed for general-purpose text generation in English. It was trained on over 15 trillion tokens from publicly available data sources and uses an optimized transformer architecture for efficient performance.

Language understanding
Knowledge recall
Coding

Supported on SN40L-8 and SN40L-16 hardware generations, with support for both training and online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Meta-Llama-3.1-70B-Instruct

Existing

Meta Llama 3.1-70B Instruct is a large, multilingual AI model designed for high-quality chat and conversation in many languages. It performs better than many other open or closed models on standard benchmarks. Trained on publicly available internet data and built with an optimized transformer architecture, it’s well-suited for chat assistants and other natural language generation tasks.

Multilingual understanding and generation
Chat applications
Natural language generation

Supported on SN40L-8 and SN40L-16 hardware generations, and capable of running online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Model supports function calling.

Meta-Llama-3.2-1B-HF

New

Meta Llama 3.2 is a collection of multilingual language models available in two sizes: 1B and 3B parameters. These models are designed to take in text and generate text in return. They’re built to handle tasks like chatting in multiple languages, summarizing content, and retrieving useful information—making them great for assistant-like applications. The Llama 3.2 models have been trained on large amounts of public online data using a powerful transformer architecture.

Multilingual understanding and generation
Chat applications

Supported on SN30 hardware with support for both training and online inference jobs.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Meta-Llama-3.2-1B-Instruct

Existing

The Llama 3.2 series offers a selection of multilingual large language models (LLMs), available in 1B and 3B parameter sizes, designed for text-based input and output. The Instruct variants are instruction-tuned models that are specifically optimized for multilingual conversational applications, excelling in tasks like agent-driven retrieval and summarization. They surpass many other open-source and proprietary chat models on widely recognized industry benchmark.

Multilingual
General purpose
Document analysis

Supported on SN40L-8 and SN40L-16 hardware generations, and capable of running online inference jobs.
Does not support function calling.

Meta-Llama-3.2-1B-Instruct-HF

New

Meta-Llama-3.2-1B-Instruct-HF is a small but capable language model designed to handle tasks like chatting, summarizing information, agentic retrieval, and working with multiple languages. It’s been fine-tuned to follow instructions better, making it more useful for practical applications like building smart assistants or language tools. Even though it’s just 1 billion parameters, it performs impressively well compared to many bigger models out there. It’s lightweight, multilingual, and great for both research and commercial use.

Multilingual understanding
Instruction following
Text generation and summarization

Supported on SN30 hardware with support for both training and online inference jobs.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Meta-Llama-3.2-3B-HF

New

Meta-Llama-3.2-3B-HF is a language model designed to understand and generate text in multiple languages. It’s part of the Llama 3.2 family and is great for tasks like chatting, summarizing, and helping retrieve information.

Text generation and summarization
Multilingual support

Supported on SN30 hardware with support for both training and online inference jobs.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Meta-Llama-3.2-3B-Instruct-HF

New

Meta-Llama-3.2-3B-Instruct-HF is a fine-tuned language model designed for tasks like multilingual conversation, summarization, and retrieving information. It is built to perform well across many open and closed-source benchmarks. The model is auto-regressive, uses a smart transformer setup, and has been trained on publicly available online data. It’s great for building chat assistants and other natural language tools that work across multiple languages.

Multilingual conversation
Text generation and summarization

Supported on SN30 hardware with support for both training and online inference jobs.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Mistral-7B-Instruct-v0.2

Existing

Mistral-7B-Instruct-v0.2 is an instruction fine-tuned version of the Mistral-7B-v0.2 language model, tailored for tasks requiring precise instruction-following capabilities. This model is particularly well-suited for a variety of applications, including content generation, text analysis, and problem-solving. It excels in creating coherent and contextually relevant text, making it ideal for tasks like report writing, code generation, and answering questions. The enhancements in this version enable it to handle more sophisticated tasks with higher accuracy and efficiency.

Instruction following
Text generation and summarization
General purpose tasks

Supported on SN40L-8, SN40L-16, and SN30 hardware generations, with support for both training and online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Mistral-7B-Instruct-v0.3

New

Mistral-7B-Instruct-v0.3 is a fine-tuned version of the Mistral-7B-v0.3 model, built to better follow instructions. Compared to the earlier v0.2 version, it brings a few important upgrades: it now has an expanded vocabulary of 32,768 tokens, uses a new v3 tokenizer, and adds support for function calling. This makes it even better for tasks like answering questions, generating text, and handling more structured outputs.

Instruction following
Text generation and summarization
General purpose tasks

Supported on SN30 hardware with support for both training and online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Mistral-7B-v0.1

Existing

Mistral-7B-v0.1 is a powerful text generation model with 7 billion parameters. It’s a base model, therefore it’s pre-trained but not fine-tuned for specific tasks. It supports long input lengths, up to 32,000 tokens. In SambaStudio, it can run with different setups depending on how long the input is — 8k tokens with one RDU or 32k tokens with up to eight RDUs.

Text generation
Long context handling

Supported on SN40L-8, SN40L-16, and SN30 hardware generations, with support for training, batch inference, and online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Mistral-7B-v0.3

New

Mistral-7B-v0.3 is a pre-trained large language model, built for text generation but not fine-tuned for following instructions. Compared to the earlier v0.2 version, it has a bigger vocabulary (32,768 tokens),supports the new v3 tokenizer, and has function calling capabilities. Since it’s a base model, it doesn’t respond like a chatbot or follow instructions directly. It also doesn’t have built-in safety features like moderation.

Text generation
Long context handling

Supported on SN30 hardware with support for both training and online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.

Zephyr-7B-Beta

Existing

Zephyr-7B-Beta is a 7 billion parameter language model, fine-tuned using Direct Preference Optimization (DPO) on a mix of public and synthetic datasets. Built on top of the Mistral-7B-v0.1 model, Zephyr-7B-Beta is designed to be more helpful and user-friendly. This model can handle long conversations with up to 32k token context and is flexible to deploy — from 8k sequence lengths on a single RDU to 32k using up to 8 RDUs.

Text generation and summarization
Instruction following
Long context handling
Chat

Supported on SN40L-8, SN40L-16, and SN30 hardware generations, with support for training, batch inference, and online inference jobs.
Supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows.
Can be used as a draft model when creating a speculative decoding pair.
Does not support function calling.