Samba-1 Turbo 25.3.1-MP2

Release version: 25.3.1-MP2 | Release date: 04/21/2025


The Samba-1 Turbo 25.3.1-MP2 (Model Pack 2) release includes new and updated model versions for existing models, and quality improvements across performance and inference efficiency.

New and updated model versions

New Models

  • Meta-Llama-3-70B

  • Meta-Llama-3-8B

  • Meta-Llama-3.2-1B-HF

  • Meta-Llama-3.2-1B-Instruct-HF

  • Meta-Llama-3.2-3B-HF

  • Meta-Llama-3.2-3B-Instruct-HF

  • Mistral-7B-v0.3

  • Mistral-7B-Instruct-v0.3

Updated Models

This release includes updated versions of the following models:

  • DeepSeek-V3

  • DeepSeek-R1

  • DeepSeek-R1-Distill-Llama-70B

  • Meta-Llama-3.1-70B-Instruct

  • Meta-Llama-3.2-1B-Instruct

Quality improvements

The following updates improve model performance, inference efficiency, and overall system throughput across multiple models and components.

  • DeepSeek-V3

    • Extended context length support to 8K and 16K tokens

    • Added support for multi-token prediction in IE for improved performance

    • Introduced faster PEF for enhanced inference speed

  • DeepSeek-R1

    • Extended context length support to 8K and 16K tokens

    • Added support for multi-token prediction in IE

    • Introduced faster PEF

  • DeepSeek-R1-Distill-Llama-70B: Increased batch size to 32 for improved throughput

  • Meta-Llama-3.1-70B-Instruct: Increased batch size to 32 for enhanced decoding performance (Spec Decoding PEF)

  • Meta-Llama-3.2-1B-Instruct: Now supports both TP8 and TP16 configurations (replaces deprecated TP16 variant)

  • Tokenizer performance in IE: Improved performance during detokenization, enabling better end-to-end throughput

Samba-1 Turbo 25.3.1-MP2 model information

The table below describes the new and updated model options in this release. Click the triangle below to expand and view the model options.

Click to view/hide the model options.
Model name Release status Description Attributes Usage notes

DeepSeek-R1

Existing

DeepSeek-R1 is a high-performance Mixture-of-Experts (MoE) language model with 671B total parameters (37B active per token), trained on 14.8T tokens. It combines Multi-head Latent Attention (MLA) and DeepSeekMoE architectures for efficient, stable training and inference. DeepSeek-R1 leverages supervised fine-tuning and reinforcement learning and delivers performance comparable to top closed-source models across language understanding and generation tasks.

  • Mathematics

  • Coding

  • Reasoning

DeepSeek-R1-Distill-Llama-70B

Existing

DeepSeek-R1-Distill-Llama-70B is a compact, finetuned language model distilled from DeepSeek-R1, based on the Llama-70B architecture. It delivers strong performance on math, code, and reasoning tasks while improving readability and reducing repetition. As part of DeepSeek’s first-generation open-source models, it offers efficient, high-quality reasoning in a dense format.

  • Mathematics

  • Coding

  • Reasoning

DeepSeek-V3

Existing

DeepSeek-V3 is a large Mixture-of-Experts (MoE) language model with 671B total parameters and 37B active per token. It uses advanced techniques like Multi-head Latent Attention (MLA) and an improved MoE setup for faster, more efficient training and inference. Trained on 14.8 trillion tokens and fine-tuned with both supervised learning and reinforcement learning, DeepSeek-V3 shows strong performance across tasks—comparable to top closed-source models—while staying stable and efficient throughout training.

  • Mathematics

  • Coding

  • Reasoning

Meta-Llama-3-70B

New

Meta’s Llama 3 series includes powerful language models with 8B and 70B parameters. The 70B model is built for generating natural language and was trained on over 15 trillion tokens of public online data. It uses a transformer-based architecture and works best for English.

  • Language understanding

  • Knowledge recall

  • Reasoning

Meta-Llama-3-8B

New

Meta-Llama-3-8B model is a powerful base language model designed for general-purpose text generation in English. It was trained on over 15 trillion tokens from publicly available data sources and uses an optimized transformer architecture for efficient performance.

  • Language understanding

  • Knowledge recall

  • Coding

Meta-Llama-3.1-70B-Instruct

Existing

Meta Llama 3.1-70B Instruct is a large, multilingual AI model designed for high-quality chat and conversation in many languages. It performs better than many other open or closed models on standard benchmarks. Trained on publicly available internet data and built with an optimized transformer architecture, it’s well-suited for chat assistants and other natural language generation tasks.

  • Multilingual understanding and generation

  • Chat applications

  • Natural language generation

Meta-Llama-3.2-1B-HF

New

Meta Llama 3.2 is a collection of multilingual language models available in two sizes: 1B and 3B parameters. These models are designed to take in text and generate text in return. They’re built to handle tasks like chatting in multiple languages, summarizing content, and retrieving useful information—making them great for assistant-like applications. The Llama 3.2 models have been trained on large amounts of public online data using a powerful transformer architecture.

  • Multilingual understanding and generation

  • Chat applications

Meta-Llama-3.2-1B-Instruct

Existing

The Llama 3.2 series offers a selection of multilingual large language models (LLMs), available in 1B and 3B parameter sizes, designed for text-based input and output. The Instruct variants are instruction-tuned models that are specifically optimized for multilingual conversational applications, excelling in tasks like agent-driven retrieval and summarization. They surpass many other open-source and proprietary chat models on widely recognized industry benchmark.

  • Multilingual

  • General purpose

  • Document analysis

  • Supported on SN40L-8 and SN40L-16 hardware generations, and capable of running online inference jobs.

  • Does not support function calling.

Meta-Llama-3.2-1B-Instruct-HF

New

Meta-Llama-3.2-1B-Instruct-HF is a small but capable language model designed to handle tasks like chatting, summarizing information, agentic retrieval, and working with multiple languages. It’s been fine-tuned to follow instructions better, making it more useful for practical applications like building smart assistants or language tools. Even though it’s just 1 billion parameters, it performs impressively well compared to many bigger models out there. It’s lightweight, multilingual, and great for both research and commercial use.

  • Multilingual understanding

  • Instruction following

  • Text generation and summarization

Meta-Llama-3.2-3B-HF

New

Meta-Llama-3.2-3B-HF is a language model designed to understand and generate text in multiple languages. It’s part of the Llama 3.2 family and is great for tasks like chatting, summarizing, and helping retrieve information.

  • Text generation and summarization

  • Multilingual support

Meta-Llama-3.2-3B-Instruct-HF

New

Meta-Llama-3.2-3B-Instruct-HF is a fine-tuned language model designed for tasks like multilingual conversation, summarization, and retrieving information. It is built to perform well across many open and closed-source benchmarks. The model is auto-regressive, uses a smart transformer setup, and has been trained on publicly available online data. It’s great for building chat assistants and other natural language tools that work across multiple languages.

  • Multilingual conversation

  • Text generation and summarization

Mistral-7B-Instruct-v0.2

Existing

Mistral-7B-Instruct-v0.2 is an instruction fine-tuned version of the Mistral-7B-v0.2 language model, tailored for tasks requiring precise instruction-following capabilities. This model is particularly well-suited for a variety of applications, including content generation, text analysis, and problem-solving. It excels in creating coherent and contextually relevant text, making it ideal for tasks like report writing, code generation, and answering questions. The enhancements in this version enable it to handle more sophisticated tasks with higher accuracy and efficiency.

  • Instruction following

  • Text generation and summarization

  • General purpose tasks

Mistral-7B-Instruct-v0.3

New

Mistral-7B-Instruct-v0.3 is a fine-tuned version of the Mistral-7B-v0.3 model, built to better follow instructions. Compared to the earlier v0.2 version, it brings a few important upgrades: it now has an expanded vocabulary of 32,768 tokens, uses a new v3 tokenizer, and adds support for function calling. This makes it even better for tasks like answering questions, generating text, and handling more structured outputs.

  • Instruction following

  • Text generation and summarization

  • General purpose tasks

Mistral-7B-v0.1

Existing

Mistral-7B-v0.1 is a powerful text generation model with 7 billion parameters. It’s a base model, therefore it’s pre-trained but not fine-tuned for specific tasks. It supports long input lengths, up to 32,000 tokens. In SambaStudio, it can run with different setups depending on how long the input is — 8k tokens with one RDU or 32k tokens with up to eight RDUs.

  • Text generation

  • Long context handling

Mistral-7B-v0.3

New

Mistral-7B-v0.3 is a pre-trained large language model, built for text generation but not fine-tuned for following instructions. Compared to the earlier v0.2 version, it has a bigger vocabulary (32,768 tokens),supports the new v3 tokenizer, and has function calling capabilities. Since it’s a base model, it doesn’t respond like a chatbot or follow instructions directly. It also doesn’t have built-in safety features like moderation.

  • Text generation

  • Long context handling

Zephyr-7B-Beta

Existing

Zephyr-7B-Beta is a 7 billion parameter language model, fine-tuned using Direct Preference Optimization (DPO) on a mix of public and synthetic datasets. Built on top of the Mistral-7B-v0.1 model, Zephyr-7B-Beta is designed to be more helpful and user-friendly. This model can handle long conversations with up to 32k token context and is flexible to deploy — from 8k sequence lengths on a single RDU to 32k using up to 8 RDUs.

  • Text generation and summarization

  • Instruction following

  • Long context handling

  • Chat