Samba-1 Turbo 24.10.1-MP1
Release version: 24.10.1-MP1 | Release date: 01/09/2025
The Samba-1 Turbo 24.10.1-MP1 (Model Pack 1) release delivers expanded deployment options, resolves bugs, and introduces several new models to the platform.
Release features
The Samba-1 Turbo 24.10.1-MP1 release features are described below.
Inference configuration improvements
-
Capabilities:
-
Supports item value to be
dict
orstr
in API V2 input. -
Upgraded transformers package version to 4.45.2.
-
Model additions
-
8-socket
-
Meta-Llama-3.1-405B-Instruct-FP8
-
-
16-socket
-
The 16-socket deployment configurations are now available for a total of six models. The two Qwen2.5 models below are the latest additions.
-
-
[New] Qwen2.5-72B-Instruct [Beta]
-
[New] Qwen2.5-0.5B-Instruct [Beta]
-
Meta-Llama-3.1-405B-Instruct
-
Meta-Llama-3.1-70B-Instruct
-
Meta-Llama-3.1-8B-Instruct
-
e5-mistral-7B-instruct (text embedding model)
Bug fixes
-
Added fix for time to first token (TTFT) monitoring in the Grafana dashboard.
-
Fixed an issue when sometimes a server would hang when exceptions occurred during inference.
-
Fixed an issue where the tokenizer would sometimes error during loading.
Samba-1 Turbo 24.10.1-MP1 model options
The table below describes the model options in the Samba-1 Turbo 24.10.1-MP1 release. Click the triangle below to expand and view the Samba-1 Turbo 24.10.1-MP1 model options.
Click to view/hide the model options in Samba-1 Turbo v0.3.1
Model Name | Release Status | Description | Attributes | Usage Notes |
---|---|---|---|---|
Meta-Llama-3.1-405B-Instruct-FP8 |
New |
Not applicable |
Not applicable |
None |
Qwen2.5-72B-Instruct [Beta] |
New |
The Qwen2.5 series of large language models offers instruction-tuned capabilities in sizes from 0.5B to 72B parameters. The 72B model excels in multilingual tasks (supporting 29+ languages), coding, mathematics, structured data comprehension (e.g., tables, JSON), and generating long outputs (up to 8K tokens, with a context length of 131K tokens). It includes improvements in instruction following, prompt diversity, and role-play implementation. Built on a transformer architecture with RoPE, SwiGLU, and RMSNorm, the model is optimized for dialogue and advanced reasoning tasks. |
|
Model is not supported in the Add a checkpoint from storage using the GUI or Import a checkpoint using the CLI workflows. |
Qwen2.5-0.5B-Instruct [Beta] |
New |
The Qwen2.5-0.5B model is an instruction-tuned, multilingual language model optimized for dialogue, coding, mathematics, and structured data tasks. It supports long-context processing (up to 32K tokens) and generates outputs of up to 8K tokens. Designed for diverse prompts and role-play scenarios, it offers advanced reasoning in 29+ languages. Built with transformers using RoPE, SwiGLU, RMSNorm, and Attention QKV bias with tied word embeddings, the model delivers efficient performance in a lightweight 0.5B parameter architecture. |
|
Model does not support sampling. |
Meta-Llama-3.1-405B-Instruct |
Existing |
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction-tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. |
|
This release supports a sequence length of up to 8k for this model. This model requires the SN40L-16 hardware generation option for deployment, utilizing 16 RDUs. |
Meta-Llama-3.2-1B-Instruct [Beta] |
Existing |
The Llama 3.2 series offers a selection of multilingual large language models (LLMs), available in 1B and 3B parameter sizes, designed for text-based input and output. The Instruct variants are instruction-tuned models that are specifically optimized for multilingual conversational applications, excelling in tasks like agent-driven retrieval and summarization. They surpass many other open-source and proprietary chat models on widely recognized industry benchmark. |
|
This release supports a sequence length of up to 8k for this model. This model is available as a beta release. Importing a checkpoint using the CLI is not supported for this model architecture in this version. |
Llama-Guard-3-8B |
Existing |
Llama Guard 3 is a Llama-3.1-8B model fine-tuned for content safety classification in both LLM prompts and responses. It labels content as safe or unsafe, noting violated categories if unsafe. Aligned with MLCommons' hazards taxonomy, it supports content moderation in eight languages and is optimized for safe search and code interpreter use. |
|
Please refer to Llama Guard model card from Meta for more details on prompt format related usage instructions. |
Qwen2.5-7B-Instruct |
Existing |
Qwen2.5 offers a range of base and instruction-tuned models (0.5 to 72B parameters) with enhanced knowledge, coding, and math capabilities. It supports improved instruction-following, structured data handling, and multilingual support in 29+ languages. |
|
This release supports a sequence length of up to 8k for this model. |
Mistral-Nemo-Instruct-2407 |
Existing |
Mistral-Nemo-Instruct-2407 is a specialized model optimized for handling long-range dependencies and structured tasks. It uses a new Tekken tokenizer, based on Tiktoken. This tokenizer is designed to compress languages and source code. It can support English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It is fine-tuned to excel in dialogue systems, answering complex, multi-step queries, and generating content that requires logical flow. Designed with instruction-following capabilities, it is particularly effective where precision and detailed responses are essential, such as in technical writing, long-form content generation, and customer support applications. Its optimization allows for smoother performance in highly specialized use cases. |
|
Mistral-Nemo-Instruct-2407 supports sequence lengths up to 4k during inference in the first release of this model. Longer sequence length support will be added in future releases. |
Existing |
Mistral Large 2 is a multi-lingual language model that supports French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. Since it was trained on a large proportion of code, it additionally supports 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. The model has been optimized for reasoning and mathematics. Additionally, Mistral Large 2 improves upon earlier versions in handling multi-turn conversations while maintaining conciseness in responses, which gears towards enterprise applications needing succinct outputs. |
|
Due to licensing restrictions, SambaNova only provides the configuration setup for running this model, not the model weights. You must obtain the appropriate usage license from Mistral AI to access the weights. Once the license is secured, the model weights can be imported on to the platform for using the Add a checkpoint feature in the Model Hub. Mistral-Large-2 supports sequence lengths up to 4k during inference in the first release of this model. Longer sequence length support will be added in future releases. |
|
gemma-2-9b-it |
Existing |
Gemma 2-9B-IT is a language model developed by Google. With 9 billion parameters, it focuses on handling a wide range of natural language processing tasks, including text generation, translation, and comprehension. The model was trained on web, code, and math-related datasets. It was trained primarily on English content. |
|
Supports sequence lengths up to 4k during inference in the first release of this model. Longer sequence length support will be added in future releases. |
Qwen2-72B-Instruct |
Existing |
Qwen2-72B-Instruct is a large-scale instruction-following language model designed to handle complex tasks in natural language understanding and generation. It outperforms competitors like LLaMA 3 and Mixel 8, particularly excelling in coding, math, and logical reasoning. Practical tests show that the 72B model handles intricate coding tasks with precision and provides well-structured explanations for complex logic problems. It excels in multilingual capabilities, handling both English and Chinese. |
|
None |
Qwen2-7B-Instruct |
Existing |
Qwen2-7B-Instruct is a smaller, efficient model with 7 billion parameters that provides reliable performance across various natural language tasks. It balances computational efficiency with robust capabilities, making it suitable for chatbots, content creation, and language translation. While not as powerful as the 72B model, it still outperforms many competitors in head-to-head evaluations, offering a versatile tool for users needing solid performance without the computational demands of larger models. |
|
None |
Sarashina2-7b |
Existing |
Sarishina2-7B is a Japanese language model developed by SB Intuitions. The model excels in natural language processing tasks and outperforms its predecessor (Sarishina1) in various benchmarks such as JCommonsenseQA and JSQuAD. The model benefits from improved pretraining methods leading to enhanced capabilities in answering complex questions and understanding Japanese text. Sarishina2 aims to provide advanced performance in Japanese language tasks, showcasing significant improvements over earlier iterations. This 7B version of Sarishina-2 is compact and efficient compared to its other variants. |
|
None |
Sarashina2-70b |
Existing |
Sarishina2-7B is a Japanese language model developed by SB Intuitions. The model excels in natural language processing tasks and outperforms its predecessor (Sarishina1) in various benchmarks such as JCommonsenseQA and JSQuAD. The model benefits from improved pretraining methods leading to enhanced capabilities in answering complex questions and understanding Japanese text. Sarishina2 aims to provide advanced performance in Japanese language tasks, showcasing significant improvements over earlier iterations. This 70B version of Sarishina-2 can be used for tasks needing high-accuracy. |
|
None |
Meta-Llama-3.1-8B-Instruct |
Existing |
Meta-Llama-3.1-8B-Instruct is an instruction following model offering a larger context window than its predecessor, Llama 3. We support up to 8K context in this release. It has multilingual capability, supporting English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. |
|
In this release, the platform supports a maximum context length of 8k for this model. Support for longer context lengths up to 128k is targeted for a subsequent release. |
Meta-Llama-3.1-70-Instruct |
Existing |
Meta-Llama-3.1-70-Instruct is an instruction following model, developed by Meta, that offers a larger context window than its predecessor, Llama 3. We support up to 8K context in this release. It has multilingual capability, supporting English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The 70B parameter model performs better in benchmarks such as MATH, GSM8K (grade school math), and MMLU (knowledge acquisition) than its 8B parameter variant. |
|
In this release, the platform supports a maximum context length of 8k for this model. Support for longer context lengths up to 128k is targeted for a subsequent release. |
Meta-Llama-3-8B-Instruct |
Existing |
Meta-Llama-3-8B-Instruct is an instruction following model belonging to the Llama 3 family of large language models. It introduced improvements on Llama 2 in areas such false refusal rates, improved alignment, and increased diversity in model responses. This family of models also sees improvements in capabilities like reasoning, code generation, instruction following, dialogue use cases, helpfulness, and safety. The Meta-Llama-3-8B-Instruct model, at 8B parameters, can be used for tasks and use-cases revolving around efficiency and needing lower computational workloads. |
|
None |
Meta-Llama-3-70-Instruct |
Existing |
Meta-Llama-3-70B-Instruct is an instruction following model belonging to the Llama 3 family of large language models. It introduced improvements on Llama 2 in areas such false refusal rates, improved alignment, and increased diversity in model responses. This family of models also sees improvements in capabilities like reasoning, code generation, instruction following, dialogue use cases, helpfulness, and safety. The Meta-Llama-3-70B-Instruct model, with its 70B parameters, is a balance of performance and resource efficiency. |
|
None |
Llama-2-7b-Chat-hf |
Existing |
Llama-2-7b-Chat-hf is a large language model, created by Meta, that expanded on the capabilities of the Llama 1 family. The Llama 2 family of models was trained on significantly more data than those of its predecessor. Llama-2-7b-Chat-hf can be used for use-cases valuing performance and efficiency. It is also more compact than its 13B and 70B variants, while still maintaining accuracy. |
|
None |
Llama-2-13b-Chat-hf |
Existing |
Llama-2-13b-Chat-hf is a large language model, created by Meta, that expanded on the capabilities of the Llama 1 family. The Llama 2 family of models was trained on significantly more data than those of its predecessor. Llama-2-13b-Chat-hf strikes a balance between performance and accuracy, sitting at 13B parameters. |
|
None |
Llama-2-70b-Chat-hf |
Existing |
Llama-2-70b-Chat-hf is a large language model, created by Meta, that expanded on the capabilities of the Llama 1 family. The Llama 2 family of models was trained on significantly more data than those of its predecessor. This chat model is optimized for dialogue use cases. Llama-2-70b-Chat-hf, compared to its 13B and 7B parameter variants, uses Grouped-Query Attention (GQA) for improved inference scalability. |
|
None |
Mistral-7B-Instruct-v0.2 |
Existing |
Mistral-7B-Instruct-v0.2 is an instruction fine-tuned version of the Mistral-7B-v0.2 language model, tailored for tasks requiring precise instruction-following capabilities. This model is particularly well-suited for a variety of applications, including content generation, text analysis, and problem-solving. It excels in creating coherent and contextually relevant text, making it ideal for tasks like report writing, code generation, and answering questions. The enhancements in this version enable it to handle more sophisticated tasks with higher accuracy and efficiency. |
|
None |
e5-Mistral-7B-Instruct |
Existing |
e5-Mistral-7B-Instruct is a text embedding model derived from Mistral-7B-v0.1. This model can be used to generate text embeddings and a similarity score based on the inputs passed in. It additionally supports other tasks through task instructions in the chat template (see the model card for detailed information). These tasks include web search query (assuming the web data is passed to the model), semantic text similarity, summarization, or retrieval of parallel text. Although this model has multilingual capabilities, it is recommended that this model is used with English text. |
|
The e5-mistral-7b-Instruct embedding models only support the Predict API, not Stream API. |
Deepseek-coder-6.7B-Instruct |
Existing |
Deepseek-coder-6.7B-Instruct is a compact, instruction following code model. This model can support use-cases such as generating code, code interpretation, debugging, code interpretation, and code refactoring. The model supports English and Chinese natural languages as well as low-level languages like Assembly, C, C++, and Rust. Additionally Deepseek-coder-33B-Instruct supports a multitude of languages and implementations including, general-purpose languages (C#, Go, Java, Python, Ruby, and TypeScript), functional programming (web development with CSS, HTML, and JavaScript), markup languages (JSON and Markdown), scripting languages (PowerShell and Shell), data and statistical tools (R and SQL), domain-specific languages (SQL and Verilog), and other tools (CMake, Makefile, Dockerfile, and Jupyter Notebook). |
|
None |
Deepseek-coder-33B-Instruct |
Existing |
Deepseek-coder-33B-Instruct is an instruction following code model. This model can support use-cases such as generating code, code interpretation, debugging, code interpretation, and code refactoring. The model supports English and Chinese natural languages as well as low-level languages like Assembly, C, C++, and Rust. Additionally Deepseek-coder-33B-Instruct supports a multitude of languages and implementations including, general-purpose languages (C#, Go, Java, Python, Ruby, and TypeScript), functional programming (web development with CSS, HTML, and JavaScript), markup languages (JSON and Markdown), scripting languages (PowerShell and Shell), data and statistical tools (R and SQL), domain-specific languages (SQL and Verilog), and other tools (CMake, Makefile, Dockerfile, and Jupyter Notebook). |
|
None |
Solar-10.7B-Instruct-v1.0 |
Existing |
Solar-10.7B-Instruct-v1.0 a general-purpose fine-tuned variant of its predecessor SOLAR-10.7B. This model family uses a methodology called depth up-scaling (DUS), which makes architectural changes to a Llama 2 based model by integrating Mistral 7B weights into upscaled layers and continuously pretraining on the result. With only 10.7B parameters, 10.7 billion parameters, it offers state-of-the-art performance in NLP tasks, even outperforming models with up to 30 billion parameters. |
|
None |
EEVE-Korean-Instruct-10.8B-v1.0 |
Existing |
The EEVE-Korean-Instruct-10.8B-v1.0 is a Korean and English instruction following model adapted from SOLAR-10.7B and Phi-2 that uses vocabulary expansion (EEVE) techniques, amongst others, to create a model that can transfer its knowledge and understanding into Korean. It can perform traditional NLP tasks in Korean. |
|
None |