Samba-1 Turbo 24.9.1-MP1

Release version: 24.9.1-MP1 | Release date: 11/13/2024

The Samba-1 Turbo 24.9.1-MP1 (Model Pack 1) release delivers enhanced performance for existing models, expanded deployment options, and introduces several new models to the platform.

In this release, we’ve updated our release naming convention to align with platform versions. Going forward, releases will be named according to the corresponding platform release, using MP (Model Pack) to represent model versions. This change replaces the previous version-based naming convention and provides clearer alignment with platform updates.

Release features

The Samba-1 Turbo 24.9.1-MP1 release features are described below.

Inference configuration improvements

8-socket option improvements:
- Llama 3.1 70B sequence length is increased from 8k to 32k.
- Llama 3.1 8B sequence length is increased from 8k to 32k.
- 3x faster throughput for Llama 3.1 70B up to sequence length of 8k via Llama 3.1 70B_1B with e5-mistral CoE (see the Summary of released Composition of Experts (CoE) section for more details).
16-socket option improvements:
- The 16-socket deployment configurations are now available fo the following 4 models:
  
  All other models in this release support 8-socket deployment only. See the Summary of released Composition of Experts (CoE) section for more details.
  - Meta-Llama-3.1-405B-Instruct.
  - Meta-Llama-3.1-70B-Instruct.
  - Meta-Llama-3.1-8B-Instruct.
  - e5-mistral-7B-instruct (text embedding model).

Model additions

Meta-Llama-3.1-405B-Instruct
Meta-Llama-3.2-1B-Instruct [Beta]
Llama-Guard-3-8B
Qwen2.5-7B-Instruct

Model deprecations

See the Summary of deprecated Composition of Experts (CoE) section for more information.

Samba-1 and corresponding models.
Samba-1 Turbo v0.2 (a previous release) and corresponding models.

Samba-1 Turbo 24.9.1-MP1 model options

The table below describes the model options in the Samba-1 Turbo 24.9.1-MP1 release. Click the triangle below to expand and view the Samba-1 Turbo 24.9.1-MP1 model options.

Click to view/hide the model options in Samba-1 Turbo v0.3.1

Model name	Release status	Description	Attributes	Usage notes
Meta-Llama-3.1-405B-Instruct	New	The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction-tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.	Multilingual Large outputs General purpose Document analysis	This release supports a sequence length of up to 8k for this model. This model requires the SN40L-16 hardware generation option for deployment, utilizing 16 RDUs.
Meta-Llama-3.2-1B-Instruct [Beta]	New	The Llama 3.2 series offers a selection of multilingual large language models (LLMs), available in 1B and 3B parameter sizes, designed for text-based input and output. The Instruct variants are instruction-tuned models that are specifically optimized for multilingual conversational applications, excelling in tasks like agent-driven retrieval and summarization. They surpass many other open-source and proprietary chat models on widely recognized industry benchmark.	Multilingual General purpose Document analysis	This release supports a sequence length of up to 8k for this model. This model is available as a beta release. Importing a checkpoint using the CLI is not supported for this model architecture in this version.
Llama-Guard-3-8B	New	Llama Guard 3 is a Llama-3.1-8B model fine-tuned for content safety classification in both LLM prompts and responses. It labels content as safe or unsafe, noting violated categories if unsafe. Aligned with MLCommons' hazards taxonomy, it supports content moderation in eight languages and is optimized for safe search and code interpreter use.	Prompt and response moderation Safety guardrail	Please refer to Llama Guard model card from Meta for more details on prompt format related usage instructions.
Qwen2.5-7B-Instruct	New	Qwen2.5 offers a range of base and instruction-tuned models (0.5 to 72B parameters) with enhanced knowledge, coding, and math capabilities. It supports improved instruction-following, structured data handling, and multilingual support in 29+ languages.	Instruction following Coding Math Logical reasoning General purpose language tasks	This release supports a sequence length of up to 8k for this model.
Mistral-Nemo-Instruct-2407	Existing	Mistral-Nemo-Instruct-2407 is a specialized model optimized for handling long-range dependencies and structured tasks. It uses a new Tekken tokenizer, based on Tiktoken. This tokenizer is designed to compress languages and source code. It can support English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It is fine-tuned to excel in dialogue systems, answering complex, multi-step queries, and generating content that requires logical flow. Designed with instruction-following capabilities, it is particularly effective where precision and detailed responses are essential, such as in technical writing, long-form content generation, and customer support applications. Its optimization allows for smoother performance in highly specialized use cases.	Multilingual Instruction-following	Mistral-Nemo-Instruct-2407 supports sequence lengths up to 4k during inference in the first release of this model. Longer sequence length support will be added in future releases.
Mistral-Large-2	Existing	Mistral Large 2 is a multi-lingual language model that supports French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. Since it was trained on a large proportion of code, it additionally supports 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. The model has been optimized for reasoning and mathematics. Additionally, Mistral Large 2 improves upon earlier versions in handling multi-turn conversations while maintaining conciseness in responses, which gears towards enterprise applications needing succinct outputs.	Multilingual Coding Multi-turn conversation Summarization	Due to licensing restrictions, SambaNova only provides the configuration setup for running this model, not the model weights. You must obtain the appropriate usage license from Mistral AI to access the weights. Once the license is secured, the model weights can be imported on to the platform for using the Add a checkpoint feature in the Model Hub. Mistral-Large-2 supports sequence lengths up to 4k during inference in the first release of this model. Longer sequence length support will be added in future releases.
gemma-2-9b-it	Existing	Gemma 2-9B-IT is a language model developed by Google. With 9 billion parameters, it focuses on handling a wide range of natural language processing tasks, including text generation, translation, and comprehension. The model was trained on web, code, and math-related datasets. It was trained primarily on English content.	General purpose language tasks Content creation	Supports sequence lengths up to 4k during inference in the first release of this model. Longer sequence length support will be added in future releases.
Qwen2-72B-Instruct	Existing	Qwen2-72B-Instruct is a large-scale instruction-following language model designed to handle complex tasks in natural language understanding and generation. It outperforms competitors like LLaMA 3 and Mixel 8, particularly excelling in coding, math, and logical reasoning. Practical tests show that the 72B model handles intricate coding tasks with precision and provides well-structured explanations for complex logic problems. It excels in multilingual capabilities, handling both English and Chinese.	Instruction following Coding Math Logical reasoning General purpose language tasks	None
Qwen2-7B-Instruct	Existing	Qwen2-7B-Instruct is a smaller, efficient model with 7 billion parameters that provides reliable performance across various natural language tasks. It balances computational efficiency with robust capabilities, making it suitable for chatbots, content creation, and language translation. While not as powerful as the 72B model, it still outperforms many competitors in head-to-head evaluations, offering a versatile tool for users needing solid performance without the computational demands of larger models.	Chatbots Content creation Language translation	None
Sarashina2-7b	Existing	Sarishina2-7B is a Japanese language model developed by SB Intuitions. The model excels in natural language processing tasks and outperforms its predecessor (Sarishina1) in various benchmarks such as JCommonsenseQA and JSQuAD. The model benefits from improved pretraining methods leading to enhanced capabilities in answering complex questions and understanding Japanese text. Sarishina2 aims to provide advanced performance in Japanese language tasks, showcasing significant improvements over earlier iterations. This 7B version of Sarishina-2 is compact and efficient compared to its other variants.	Multilingual Japanese translation General purpose language tasks	None
Sarashina2-70b	Existing	Sarishina2-7B is a Japanese language model developed by SB Intuitions. The model excels in natural language processing tasks and outperforms its predecessor (Sarishina1) in various benchmarks such as JCommonsenseQA and JSQuAD. The model benefits from improved pretraining methods leading to enhanced capabilities in answering complex questions and understanding Japanese text. Sarishina2 aims to provide advanced performance in Japanese language tasks, showcasing significant improvements over earlier iterations. This 70B version of Sarishina-2 can be used for tasks needing high-accuracy.	Multilingual Japanese translation	None
Meta-Llama-3.1-8B-Instruct	Existing	Meta-Llama-3.1-8B-Instruct is an instruction following model offering a larger context window than its predecessor, Llama 3. We support up to 8K context in this release. It has multilingual capability, supporting English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.	Multilingual Large outputs General purpose Document analysis	In this release, the platform supports a maximum context length of 8k for this model. Support for longer context lengths up to 128k is targeted for a subsequent release.
Meta-Llama-3.1-70-Instruct	Existing	Meta-Llama-3.1-70-Instruct is an instruction following model, developed by Meta, that offers a larger context window than its predecessor, Llama 3. We support up to 8K context in this release. It has multilingual capability, supporting English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The 70B parameter model performs better in benchmarks such as MATH, GSM8K (grade school math), and MMLU (knowledge acquisition) than its 8B parameter variant.	Multilingual Large outputs General purpose Document analysis	In this release, the platform supports a maximum context length of 8k for this model. Support for longer context lengths up to 128k is targeted for a subsequent release.
Meta-Llama-3-8B-Instruct	Existing	Meta-Llama-3-8B-Instruct is an instruction following model belonging to the Llama 3 family of large language models. It introduced improvements on Llama 2 in areas such false refusal rates, improved alignment, and increased diversity in model responses. This family of models also sees improvements in capabilities like reasoning, code generation, instruction following, dialogue use cases, helpfulness, and safety. The Meta-Llama-3-8B-Instruct model, at 8B parameters, can be used for tasks and use-cases revolving around efficiency and needing lower computational workloads.	Instruction following General purpose language tasks	None
Meta-Llama-3-70-Instruct	Existing	Meta-Llama-3-70B-Instruct is an instruction following model belonging to the Llama 3 family of large language models. It introduced improvements on Llama 2 in areas such false refusal rates, improved alignment, and increased diversity in model responses. This family of models also sees improvements in capabilities like reasoning, code generation, instruction following, dialogue use cases, helpfulness, and safety. The Meta-Llama-3-70B-Instruct model, with its 70B parameters, is a balance of performance and resource efficiency.	Instruction following General purpose language tasks	None
Llama-2-7b-Chat-hf	Existing	Llama-2-7b-Chat-hf is a large language model, created by Meta, that expanded on the capabilities of the Llama 1 family. The Llama 2 family of models was trained on significantly more data than those of its predecessor. Llama-2-7b-Chat-hf can be used for use-cases valuing performance and efficiency. It is also more compact than its 13B and 70B variants, while still maintaining accuracy.	Conversational Chat Dialogue Assistant-like chat	None
Llama-2-13b-Chat-hf	Existing	Llama-2-13b-Chat-hf is a large language model, created by Meta, that expanded on the capabilities of the Llama 1 family. The Llama 2 family of models was trained on significantly more data than those of its predecessor. Llama-2-13b-Chat-hf strikes a balance between performance and accuracy, sitting at 13B parameters.	Conversational Chat Dialogue Assistant-like chat	None
Llama-2-70b-Chat-hf	Existing	Llama-2-70b-Chat-hf is a large language model, created by Meta, that expanded on the capabilities of the Llama 1 family. The Llama 2 family of models was trained on significantly more data than those of its predecessor. This chat model is optimized for dialogue use cases. Llama-2-70b-Chat-hf, compared to its 13B and 7B parameter variants, uses Grouped-Query Attention (GQA) for improved inference scalability.	Conversational Chat Dialogue Assistant-like chat	None
Mistral-7B-Instruct-v0.2	Existing	Mistral-7B-Instruct-v0.2 is an instruction fine-tuned version of the Mistral-7B-v0.2 language model, tailored for tasks requiring precise instruction-following capabilities. This model is particularly well-suited for a variety of applications, including content generation, text analysis, and problem-solving. It excels in creating coherent and contextually relevant text, making it ideal for tasks like report writing, code generation, and answering questions. The enhancements in this version enable it to handle more sophisticated tasks with higher accuracy and efficiency.	Instruction following General purpose tasks	None
e5-Mistral-7B-Instruct	Existing	e5-Mistral-7B-Instruct is a text embedding model derived from Mistral-7B-v0.1. This model can be used to generate text embeddings and a similarity score based on the inputs passed in. It additionally supports other tasks through task instructions in the chat template (see the model card for detailed information). These tasks include web search query (assuming the web data is passed to the model), semantic text similarity, summarization, or retrieval of parallel text. Although this model has multilingual capabilities, it is recommended that this model is used with English text.	Embedding Text similarity Retrieval	The e5-mistral-7b-Instruct embedding models only support the Predict API, not Stream API.
Deepseek-coder-6.7B-Instruct	Existing	Deepseek-coder-6.7B-Instruct is a compact, instruction following code model. This model can support use-cases such as generating code, code interpretation, debugging, code interpretation, and code refactoring. The model supports English and Chinese natural languages as well as low-level languages like Assembly, C, C++, and Rust. Additionally Deepseek-coder-33B-Instruct supports a multitude of languages and implementations including, general-purpose languages (C#, Go, Java, Python, Ruby, and TypeScript), functional programming (web development with CSS, HTML, and JavaScript), markup languages (JSON and Markdown), scripting languages (PowerShell and Shell), data and statistical tools (R and SQL), domain-specific languages (SQL and Verilog), and other tools (CMake, Makefile, Dockerfile, and Jupyter Notebook).	Coding model Code generation	None
Deepseek-coder-33B-Instruct	Existing	Deepseek-coder-33B-Instruct is an instruction following code model. This model can support use-cases such as generating code, code interpretation, debugging, code interpretation, and code refactoring. The model supports English and Chinese natural languages as well as low-level languages like Assembly, C, C++, and Rust. Additionally Deepseek-coder-33B-Instruct supports a multitude of languages and implementations including, general-purpose languages (C#, Go, Java, Python, Ruby, and TypeScript), functional programming (web development with CSS, HTML, and JavaScript), markup languages (JSON and Markdown), scripting languages (PowerShell and Shell), data and statistical tools (R and SQL), domain-specific languages (SQL and Verilog), and other tools (CMake, Makefile, Dockerfile, and Jupyter Notebook).	Coding model Code generation	None
Solar-10.7B-Instruct-v1.0	Existing	Solar-10.7B-Instruct-v1.0 a general-purpose fine-tuned variant of its predecessor SOLAR-10.7B. This model family uses a methodology called depth up-scaling (DUS), which makes architectural changes to a Llama 2 based model by integrating Mistral 7B weights into upscaled layers and continuously pretraining on the result. With only 10.7B parameters, 10.7 billion parameters, it offers state-of-the-art performance in NLP tasks, even outperforming models with up to 30 billion parameters.	General purpose Compact	None
EEVE-Korean-Instruct-10.8B-v1.0	Existing	The EEVE-Korean-Instruct-10.8B-v1.0 is a Korean and English instruction following model adapted from SOLAR-10.7B and Phi-2 that uses vocabulary expansion (EEVE) techniques, amongst others, to create a model that can transfer its knowledge and understanding into Korean. It can perform traditional NLP tasks in Korean.	General purpose model Korean Multilingual	None

Model name

Release status

Description

Attributes

Usage notes

Meta-Llama-3.1-405B-Instruct

New

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction-tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Multilingual
Large outputs
General purpose
Document analysis

This release supports a sequence length of up to 8k for this model.

This model requires the SN40L-16 hardware generation option for deployment, utilizing 16 RDUs.

Meta-Llama-3.2-1B-Instruct [Beta]

New

The Llama 3.2 series offers a selection of multilingual large language models (LLMs), available in 1B and 3B parameter sizes, designed for text-based input and output. The Instruct variants are instruction-tuned models that are specifically optimized for multilingual conversational applications, excelling in tasks like agent-driven retrieval and summarization. They surpass many other open-source and proprietary chat models on widely recognized industry benchmark.

Multilingual
General purpose
Document analysis

This release supports a sequence length of up to 8k for this model.

This model is available as a beta release. Importing a checkpoint using the CLI is not supported for this model architecture in this version.

Llama-Guard-3-8B

New

Llama Guard 3 is a Llama-3.1-8B model fine-tuned for content safety classification in both LLM prompts and responses. It labels content as safe or unsafe, noting violated categories if unsafe. Aligned with MLCommons' hazards taxonomy, it supports content moderation in eight languages and is optimized for safe search and code interpreter use.

Prompt and response moderation
Safety guardrail

Please refer to Llama Guard model card from Meta for more details on prompt format related usage instructions.

Qwen2.5-7B-Instruct

New

Qwen2.5 offers a range of base and instruction-tuned models (0.5 to 72B parameters) with enhanced knowledge, coding, and math capabilities. It supports improved instruction-following, structured data handling, and multilingual support in 29+ languages.

Instruction following
Coding
Math
Logical reasoning
General purpose language tasks

This release supports a sequence length of up to 8k for this model.

Mistral-Nemo-Instruct-2407

Existing

Mistral-Nemo-Instruct-2407 is a specialized model optimized for handling long-range dependencies and structured tasks. It uses a new Tekken tokenizer, based on Tiktoken. This tokenizer is designed to compress languages and source code. It can support English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It is fine-tuned to excel in dialogue systems, answering complex, multi-step queries, and generating content that requires logical flow. Designed with instruction-following capabilities, it is particularly effective where precision and detailed responses are essential, such as in technical writing, long-form content generation, and customer support applications. Its optimization allows for smoother performance in highly specialized use cases.

Multilingual
Instruction-following

Mistral-Nemo-Instruct-2407 supports sequence lengths up to 4k during inference in the first release of this model. Longer sequence length support will be added in future releases.

Mistral-Large-2

Existing

Mistral Large 2 is a multi-lingual language model that supports French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. Since it was trained on a large proportion of code, it additionally supports 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. The model has been optimized for reasoning and mathematics. Additionally, Mistral Large 2 improves upon earlier versions in handling multi-turn conversations while maintaining conciseness in responses, which gears towards enterprise applications needing succinct outputs.

Multilingual
Coding
Multi-turn conversation
Summarization

Due to licensing restrictions, SambaNova only provides the configuration setup for running this model, not the model weights. You must obtain the appropriate usage license from Mistral AI to access the weights. Once the license is secured, the model weights can be imported on to the platform for using the Add a checkpoint feature in the Model Hub.

Mistral-Large-2 supports sequence lengths up to 4k during inference in the first release of this model. Longer sequence length support will be added in future releases.

gemma-2-9b-it

Existing

Gemma 2-9B-IT is a language model developed by Google. With 9 billion parameters, it focuses on handling a wide range of natural language processing tasks, including text generation, translation, and comprehension. The model was trained on web, code, and math-related datasets. It was trained primarily on English content.

General purpose language tasks
Content creation

Supports sequence lengths up to 4k during inference in the first release of this model. Longer sequence length support will be added in future releases.

Qwen2-72B-Instruct

Existing

Qwen2-72B-Instruct is a large-scale instruction-following language model designed to handle complex tasks in natural language understanding and generation. It outperforms competitors like LLaMA 3 and Mixel 8, particularly excelling in coding, math, and logical reasoning. Practical tests show that the 72B model handles intricate coding tasks with precision and provides well-structured explanations for complex logic problems. It excels in multilingual capabilities, handling both English and Chinese.

Instruction following
Coding
Math
Logical reasoning
General purpose language tasks

None

Qwen2-7B-Instruct

Existing

Qwen2-7B-Instruct is a smaller, efficient model with 7 billion parameters that provides reliable performance across various natural language tasks. It balances computational efficiency with robust capabilities, making it suitable for chatbots, content creation, and language translation. While not as powerful as the 72B model, it still outperforms many competitors in head-to-head evaluations, offering a versatile tool for users needing solid performance without the computational demands of larger models.

Chatbots
Content creation
Language translation

None

Sarashina2-7b

Existing

Sarishina2-7B is a Japanese language model developed by SB Intuitions. The model excels in natural language processing tasks and outperforms its predecessor (Sarishina1) in various benchmarks such as JCommonsenseQA and JSQuAD. The model benefits from improved pretraining methods leading to enhanced capabilities in answering complex questions and understanding Japanese text. Sarishina2 aims to provide advanced performance in Japanese language tasks, showcasing significant improvements over earlier iterations. This 7B version of Sarishina-2 is compact and efficient compared to its other variants.

Multilingual
Japanese translation
General purpose language tasks

None

Sarashina2-70b

Existing

Sarishina2-7B is a Japanese language model developed by SB Intuitions. The model excels in natural language processing tasks and outperforms its predecessor (Sarishina1) in various benchmarks such as JCommonsenseQA and JSQuAD. The model benefits from improved pretraining methods leading to enhanced capabilities in answering complex questions and understanding Japanese text. Sarishina2 aims to provide advanced performance in Japanese language tasks, showcasing significant improvements over earlier iterations. This 70B version of Sarishina-2 can be used for tasks needing high-accuracy.

Multilingual
Japanese translation

None

Meta-Llama-3.1-8B-Instruct

Existing

Meta-Llama-3.1-8B-Instruct is an instruction following model offering a larger context window than its predecessor, Llama 3. We support up to 8K context in this release. It has multilingual capability, supporting English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Multilingual
Large outputs
General purpose
Document analysis

In this release, the platform supports a maximum context length of 8k for this model. Support for longer context lengths up to 128k is targeted for a subsequent release.

Meta-Llama-3.1-70-Instruct

Existing

Meta-Llama-3.1-70-Instruct is an instruction following model, developed by Meta, that offers a larger context window than its predecessor, Llama 3. We support up to 8K context in this release. It has multilingual capability, supporting English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The 70B parameter model performs better in benchmarks such as MATH, GSM8K (grade school math), and MMLU (knowledge acquisition) than its 8B parameter variant.

Multilingual
Large outputs
General purpose
Document analysis

In this release, the platform supports a maximum context length of 8k for this model. Support for longer context lengths up to 128k is targeted for a subsequent release.

Meta-Llama-3-8B-Instruct

Existing

Meta-Llama-3-8B-Instruct is an instruction following model belonging to the Llama 3 family of large language models. It introduced improvements on Llama 2 in areas such false refusal rates, improved alignment, and increased diversity in model responses. This family of models also sees improvements in capabilities like reasoning, code generation, instruction following, dialogue use cases, helpfulness, and safety. The Meta-Llama-3-8B-Instruct model, at 8B parameters, can be used for tasks and use-cases revolving around efficiency and needing lower computational workloads.

Instruction following
General purpose language tasks

None

Meta-Llama-3-70-Instruct

Existing

Meta-Llama-3-70B-Instruct is an instruction following model belonging to the Llama 3 family of large language models. It introduced improvements on Llama 2 in areas such false refusal rates, improved alignment, and increased diversity in model responses. This family of models also sees improvements in capabilities like reasoning, code generation, instruction following, dialogue use cases, helpfulness, and safety. The Meta-Llama-3-70B-Instruct model, with its 70B parameters, is a balance of performance and resource efficiency.

Instruction following
General purpose language tasks

None

Llama-2-7b-Chat-hf

Existing

Llama-2-7b-Chat-hf is a large language model, created by Meta, that expanded on the capabilities of the Llama 1 family. The Llama 2 family of models was trained on significantly more data than those of its predecessor. Llama-2-7b-Chat-hf can be used for use-cases valuing performance and efficiency. It is also more compact than its 13B and 70B variants, while still maintaining accuracy.

Conversational
Chat
Dialogue
Assistant-like chat

None

Llama-2-13b-Chat-hf

Existing

Llama-2-13b-Chat-hf is a large language model, created by Meta, that expanded on the capabilities of the Llama 1 family. The Llama 2 family of models was trained on significantly more data than those of its predecessor. Llama-2-13b-Chat-hf strikes a balance between performance and accuracy, sitting at 13B parameters.

Conversational
Chat
Dialogue
Assistant-like chat

None

Llama-2-70b-Chat-hf

Existing

Llama-2-70b-Chat-hf is a large language model, created by Meta, that expanded on the capabilities of the Llama 1 family. The Llama 2 family of models was trained on significantly more data than those of its predecessor. This chat model is optimized for dialogue use cases. Llama-2-70b-Chat-hf, compared to its 13B and 7B parameter variants, uses Grouped-Query Attention (GQA) for improved inference scalability.

Conversational
Chat
Dialogue
Assistant-like chat

None

Mistral-7B-Instruct-v0.2

Existing

Mistral-7B-Instruct-v0.2 is an instruction fine-tuned version of the Mistral-7B-v0.2 language model, tailored for tasks requiring precise instruction-following capabilities. This model is particularly well-suited for a variety of applications, including content generation, text analysis, and problem-solving. It excels in creating coherent and contextually relevant text, making it ideal for tasks like report writing, code generation, and answering questions. The enhancements in this version enable it to handle more sophisticated tasks with higher accuracy and efficiency.

Instruction following
General purpose tasks

None

e5-Mistral-7B-Instruct

Existing

e5-Mistral-7B-Instruct is a text embedding model derived from Mistral-7B-v0.1. This model can be used to generate text embeddings and a similarity score based on the inputs passed in. It additionally supports other tasks through task instructions in the chat template (see the model card for detailed information). These tasks include web search query (assuming the web data is passed to the model), semantic text similarity, summarization, or retrieval of parallel text. Although this model has multilingual capabilities, it is recommended that this model is used with English text.

Embedding
Text similarity
Retrieval

The e5-mistral-7b-Instruct embedding models only support the Predict API, not Stream API.

Deepseek-coder-6.7B-Instruct

Existing

Deepseek-coder-6.7B-Instruct is a compact, instruction following code model. This model can support use-cases such as generating code, code interpretation, debugging, code interpretation, and code refactoring. The model supports English and Chinese natural languages as well as low-level languages like Assembly, C, C++, and Rust. Additionally Deepseek-coder-33B-Instruct supports a multitude of languages and implementations including, general-purpose languages (C#, Go, Java, Python, Ruby, and TypeScript), functional programming (web development with CSS, HTML, and JavaScript), markup languages (JSON and Markdown), scripting languages (PowerShell and Shell), data and statistical tools (R and SQL), domain-specific languages (SQL and Verilog), and other tools (CMake, Makefile, Dockerfile, and Jupyter Notebook).

Coding model
Code generation

None

Deepseek-coder-33B-Instruct

Existing

Deepseek-coder-33B-Instruct is an instruction following code model. This model can support use-cases such as generating code, code interpretation, debugging, code interpretation, and code refactoring. The model supports English and Chinese natural languages as well as low-level languages like Assembly, C, C++, and Rust. Additionally Deepseek-coder-33B-Instruct supports a multitude of languages and implementations including, general-purpose languages (C#, Go, Java, Python, Ruby, and TypeScript), functional programming (web development with CSS, HTML, and JavaScript), markup languages (JSON and Markdown), scripting languages (PowerShell and Shell), data and statistical tools (R and SQL), domain-specific languages (SQL and Verilog), and other tools (CMake, Makefile, Dockerfile, and Jupyter Notebook).

Coding model
Code generation

None

Solar-10.7B-Instruct-v1.0

Existing

Solar-10.7B-Instruct-v1.0 a general-purpose fine-tuned variant of its predecessor SOLAR-10.7B. This model family uses a methodology called depth up-scaling (DUS), which makes architectural changes to a Llama 2 based model by integrating Mistral 7B weights into upscaled layers and continuously pretraining on the result. With only 10.7B parameters, 10.7 billion parameters, it offers state-of-the-art performance in NLP tasks, even outperforming models with up to 30 billion parameters.

General purpose
Compact

None

EEVE-Korean-Instruct-10.8B-v1.0

Existing

The EEVE-Korean-Instruct-10.8B-v1.0 is a Korean and English instruction following model adapted from SOLAR-10.7B and Phi-2 that uses vocabulary expansion (EEVE) techniques, amongst others, to create a model that can transfer its knowledge and understanding into Korean. It can perform traditional NLP tasks in Korean.

General purpose model
Korean
Multilingual

None

Summary of released Composition of Experts (CoE)

The 24.9.1-MP1 release includes the five CoE configurations described below.

Samba-1 Turbo: deployable on SN40L-8 and SN40L-16, using 8 RDUs per instance.
Llama 3.1 70B_1B with e5-mistral: deployable on SN40L-8 and SN40L-16, using 8 RDUs per instance.
Llama 3.1 405B with e5-mistral: deployable on SN40L-16, using 16 RDUs per instance.
Llama 3.1 70B with e5-mistral: deployable on SN40L-16, using 16 RDUs per instance.
Llama 3.1 8B with e5-mistral: deployable on SN40L-16, using 16 RDUs per instance.

8-RDU configurations

Samba-1 Turbo and Llama 3.1 70B_1B with e5-mistral are example compositions of high-performance inference models built by SambaNova. These compositions integrate various LLMs and one text embedding model, enabling RAG applications through a single endpoint. These CoEs are deployable on 8 RDUs and require either the SN40L-8 or SN40L-16 hardware generation. Each model within each CoE is accessible directly via API, with minimal inter-model switching time. Multiple batch sizes are supported, and the platform intelligently selects the optimal batch size based on the specific request combination at inference time, maximizing concurrency.

CoE name	Expert/Model name	Dynamic batch size supported
Samba-1 Turbo	Meta-Llama-3-8B-Instruct Meta-Llama-3-70B-Instruct Mistral-7B-Instruct-V0.2 e5-mistral-7b-Instruct Qwen2-7B-Instruct Qwe2-72B-Instruct	1, 4, 8, 16, 32 1, 4, 8, 16, 32 1, 4, 8, 16, 32 1, 4, 8 1, 4, 8, 16 1, 4, 8, 16
Llama 3.1 70B_1B with e5-mistral	Meta-Llama-3.1-70B-Instruct e5-mistral-7b-Instruct	1, 4, 8 1, 4, 8, 16, 32

CoE name

Expert/Model name

Dynamic batch size supported

Samba-1 Turbo

Meta-Llama-3-8B-Instruct

Meta-Llama-3-70B-Instruct

Mistral-7B-Instruct-V0.2

e5-mistral-7b-Instruct

Qwen2-7B-Instruct

Qwe2-72B-Instruct

1, 4, 8, 16, 32

1, 4, 8

1, 4, 8, 16

Llama 3.1 70B_1B with e5-mistral

Meta-Llama-3.1-70B-Instruct

e5-mistral-7b-Instruct

1, 4, 8

1, 4, 8, 16, 32

16-RDU configuration

The three CoEs described below each integrates one of the three available sizes of the Llama-3.1 models with the text embedding model e5-mistral-7b-Instruct, enabling RAG applications through a single endpoint. These high-performance CoEs are deployable on 16 RDUs and require the SN40L-16 hardware generation. Each model within each COE is accessible directly via API, with minimal inter-model switching time. Multiple batch sizes are supported, and the platform intelligently selects the optimal batch size based on the specific request combination at inference time, maximizing concurrency.

CoE name	Expert/Model name	Dynamic batch size supported
Llama 3.1 405B with e5-mistral	Meta-Llama-3.1-405B-Instruct e5-mistral-7b-Instruct	1, 4 1, 4, 8, 16, 32
Llama 3.1 70B with e5-mistral	Meta-Llama-3.1-70B-Instruct e5-mistral-7b-Instruct	1, 2, 4, 8 1, 4, 8, 16, 32
Llama 3.1 8B with e5-mistral	Meta-Llama-3.1-8B-Instruct e5-mistral-7b-Instruct	1, 2, 4, 8 1, 4, 8, 16, 32

CoE name

Expert/Model name

Dynamic batch size supported

Llama 3.1 405B with e5-mistral

Meta-Llama-3.1-405B-Instruct

e5-mistral-7b-Instruct

1, 4

1, 4, 8, 16, 32

Llama 3.1 70B with e5-mistral

Meta-Llama-3.1-70B-Instruct

e5-mistral-7b-Instruct

1, 2, 4, 8

1, 4, 8, 16, 32

Llama 3.1 8B with e5-mistral

Meta-Llama-3.1-8B-Instruct

e5-mistral-7b-Instruct

1, 2, 4, 8

1, 4, 8, 16, 32

Summary of deprecated Composition of Experts (CoE)

The 24.9.1-MP1 release consolidates the finely divided out-of-the-box CoE bundles from the v0.2 release into a more unified, converged form. This streamlined approach simplifies the default setup and reduces disk space usage while preserving flexibility for custom configurations. Users can now access v0.2 models in two ways:

The new CoE bundles listed in the Summary of released Composition of Experts (CoE) section above.
As expert options when creating custom CoE bundles. See the Samba-1 Turbo 24.9.1-MP1 model options section for those model options.

Deprecated CoE bundles

The table below describes all of the CoE model bundles being deprecated.

CoE name

Description

Samba-1 Turbo [Beta]

Samba-1 Turbo [Beta] contains a breadth of general purpose LLMs in addition to a coding model. The parameter counts in this CoE vary to provide both highly accurate models as well as those that are more light-weight. This CoE can be used for general purpose applications as well as those requiring coding tasks.

Samba-1 Turbo with embedding - small [Beta]

Samba-1 Turbo with embedding - small [Beta] comprise the small, performant versions of the new LLMs in this Samba-1 Turbo release. In addition to the LLMs, Samba-1 Turbo with embedding - small contains the text embedding e5-mistral-7b-instruct models for tasks needing an embedding output. The two vocabulary size variants of the embedding models are 8192 and 32768, used for shorter-form and longer-form content respectively. This CoE can be used for use-cases requiring embedding models in a light-weight and performant context.

Samba-1 Turbo Llama 3.1 70B 4096 dynamic batching

The Samba-1 Turbo Llama 3.1 70B 4096 dynamic CoE contains the 4096 sequence length variant of Llama 3.1 70B making it a CoE that can be used for general purpose tasks with shorter-form inputs, generally leading to a quicker first token latency.

Samba-1 Turbo Llama 3.1 70B 8192 dynamic batching

The Samba-1 Turbo Llama 3.1 70B 8192 dynamic CoE contains the 8192 sequence length variant of Llama 3.1 70B making it a CoE that can be used for general purpose NLP tasks.

Samba-1 Turbo Llama 3.1 8B 4096 dynamic batching

The Samba-1 Turbo Llama 3.1 8B 4096 dynamic CoE contains the 4096 sequence length variant of Llama 3.1 8B making it a CoE that can be used for general purpose tasks with shorter-form inputs, generally leading to a quicker first token latency.

Samba-1 Turbo Llama 3.1 8B 8192 dynamic batching

The Samba-1 Turbo Llama 3.1 8B 8192 dynamic CoE contains the 8192 sequence length variant of Llama 3.1 8B making it a CoE that can be used for general purpose NLP tasks.

Samba-1 Turbo Deepseek Coder 6.7B 4096 dynamic batching

The Samba-1 Turbo Deepseek Coder 6.7B 4096 dynamic batching CoE contains the 4096 sequence length variant of deepseek-coder-6.7B-instruct. This CoE can be used for instruction-based coding tasks.

Samba-1 Turbo Llama 2 13B 4096 dynamic batching

The Samba-1 Turbo Llama 2 13B 4096 dynamic batching CoE contains the 13B parameter variant of Llama 2. This CoE can be used for general-purpose tasks in a conversational or dialogue-based setting.

Samba-1 Turbo Llama 2 7B 4096 dynamic batching

The Samba-1 Turbo Llama 2 7B 4096 dynamic batching CoE contains the 7B parameter variant of Llama 2. This CoE can be used for general-purpose tasks in a conversational or dialogue-based setting.

Samba-1 Turbo Llama 3 70B 4096 dynamic batching

The Samba-1 Turbo Llama 3 70B 4096 dynamic batching CoE contains the 4096 sequence length variant of Llama 3 70B making it a CoE that can be used for general purpose tasks with shorter-form inputs, generally leading to a quicker first token latency.

Samba-1 Turbo Llama 3 70B 8192 dynamic batching

The Samba-1 Turbo Llama 3 70B 8192 dynamic batching CoE contains the 8192 sequence length variant of Llama 3 70B making it a CoE that can be used for general purpose NLP tasks.

Samba-1 Turbo Llama 3 8B 4096 dynamic batching

The Samba-1 Turbo Llama 3 8B 4096 dynamic batching CoE contains the 4096 sequence length variant of Llama 3 8B making it a relatively compact CoE that can be used for general purpose tasks with shorter-form inputs.

Samba-1 Turbo Llama 3 8B 8192 dynamic batching

The Samba-1 Turbo Llama 3 8B 8192 dynamic batching CoE contains the 8192 sequence length variant of Llama 3 8B making it a relatively compact CoE that can be used for general purpose NLP tasks.

Samba-1 Turbo Mistral 7B 4096 dynamic batching

The Samba-1 Turbo Mistral 7B 4096 dynamic batching CoE contains the compact, 7B parameter Mistral instruction following model. This CoE can be used for general purpose instruction following or assistant-like tasks with smaller-form inputs.

Samba-1 Turbo Llama 2 70B 4096 dynamic batching

The Samba-1 Turbo Llama 2 70B 4096 dynamic batching CoE contains the large, Llama 2 70B parameter model variant. The CoE can be used for general purpose dialogue or chat-based applications along with smaller-form inputs.

Samba-1 Turbo Deepseek Coder 33B 4096 dynamic batching

The Samba-1 Turbo Deepseek Coder 33B 4096 dynamic batching CoE contains the 33B variant of deepseek-coder-instruct at a 4096 sequence length. This CoE can use used for relatively shorter-form input sizes for coding tasks.

Samba-1 Turbo Deepseek Coder 33B 16384 dynamic batching

This CoE contains the 33B variant of deepseek-coder-instruct at a 16384 sequence length. Because of the increased sequence length, this CoE can use used for coding-related tasks. This could include reading in larger coding blocks for interpretation and more.

Samba-1.1

Samba-1.1 was an iteration of Samba-1 and has been superseded with newer options. It contains 94 expert models.

Deprecated models

The list below describes the specific models being deprecated with this release. Most of the deprecations are older models with lower performance. See the Samba-1 Turbo 24.9.1-MP1 model options section above for the supported models.

autoj-13b
BioMistral-7B
bioMistral-7B-32k
BioMistral-7B-DARE
BioMistral-7B-SLERP
BioMistral-7B-TIES
BLOOMChatv2-2k
BLOOMChatv2-8k
codegemma-7b
codegemma-7b-it
CodeLlama-13b-Instruct-hf
CodeLlama-13b-Python-hf
CodeLlama-70b-Instruct-hf
CodeLlama-70b-Python-hf
CodeLlama-7b-Instruct-hf
CodeLlama-7b-Python-hf
deepseek-coder-1.3b-instruct
deepseek-coder-33B-instruct-4096
deepseek-coder-6.7b-instruct
deepseek-coder-6.7B-instruct-4096
deepseek-llm-67b-chat
deepseek-llm-7b-chat
DonutLM-v1
e5-mistral-7b-instruct-32768
e5-mistral-7b-instruct-8192
ELYZA-japanese-codeLlama-7b
ELYZA-japanese-codeLlama-7b-instruct
ELYZA-japanese-llama-2-7b
EmertonMonarch-7B
Explore-LM-7B-Rewriting
finance-chat
gemma-7b
gemma-7b-it
Genstruct-7B
Genstruct-7B-32k
GOAT-70B-Storytelling
karakuri-lm-70b-chat
law-chat
Lil-c3po
Lil-c3po-deprecated
llama-2-13b-hf
llama-2-70b-hf
llama-2-7b-hf
LlamaGuard-7b
lumos_web_agent_ground_iterative
lumos_web_agent_plan_iterative
Magicoder-S-CL-7B
Magicoder-S-DS-6.7B
Magicoder-S-DS-6.7B-16k
medicine-chat
Meta-Llama-3-70B
Meta-Llama-3-70B
Meta-Llama-3-70B-Instruct-8192
Meta-Llama-3-8B
Meta-Llama-3-8B-Instruct-8192
Meta-Llama-Guard-2-8B
Meta-Llama-Guard-2-8B
Mistral-7B-Instruct-v0.2-32k
Mistral-7B-Instruct-V0.2-4096
Mistral-7B-OpenOrca
Mistral-T5-7B-v1
Mistral-T5-7b-v1
NexusRaven-V2-13B
Nous-Hermes-2-Mistral-7B-DPO
Nous-Hermes-llama-2-7b
Nous-Hermes-Llama2-13b
nsql-llama-2-7b
OpenHermes-2p5-Mistral-7B
Rabbit-7B-DPO-Chat
SambaCoder-nsql-llama-2-70b
SambaLingo-70b-Arabic-Chat
SambaLingo-70b-Hungarian-Chat
SambaLingo-70b-Thai-Chat
SambaLingo-Arabic-Chat
SambaLingo-Bulgarian-Chat
SambaLingo-Hungarian-Chat
SambaLingo-Japanese-Chat
SambaLingo-Russian-Chat
SambaLingo-Serbian-Chat
SambaLingo-Slovenian-Chat
SambaLingo-Thai-Chat
SambaLingo-Turkish-Chat
Saul-Instruct-v1
Saul-Instruct-v1-32k
Snorkel-Mistral-PairRM-DPO
sqlcoder-70b-alpha
sqlcoder-7b
sqlcoder-7b-2
sqlcoder-7b-32k
Starling-LM-7B-beta
Swallow-13b-instruct-v0.1
Swallow-70b-instruct-v0.1
Swallow-70b-NVE-instruct-hf
Swallow-7b-instruct-v0.1
Swallow-7b-NVE-instruct-hf
TableLlama
tulu-2-13b
tulu-2-70b
tulu-2-7b
tulu-2-dpo-13b
tulu-2-dpo-70b
tulu-2-dpo-7b
typhoon-7b
typhoon-7b-32k
UniNER-7B-all
v1olet_merged_dpo_7b
WestLake-7b-v2-laser-truthy-dpo
WestLake-7B-v2-laser-truthy-dpo
Xwin-Math-13B-V1.0
Xwin-Math-70B-V1.0
Xwin-Math-7B-V1.0
zephyr-7b-beta
zephyr-7b-beta-32k