Model Zoo release notes

These release notes summarize Model Zoo features, bugs fixed, and known issues. Release notes for SambaFlow are in SambaFlow release notes.

Model Zoo 0.3.0

These release notes summarize the new features, improvements, and known issues for the 0.3.0 release of SambaNova Model Zoo.

Overview

This release introduces new features in the Model Zoo, expanding its capabilities in terms of model support, workflows, and performance.

New features

Enhanced hardware compatibility

All models are now verified on the SN40 platform, extending compatibility beyond SN30. This ensures broader hardware support for users with newer systems.

Inference and training enhancements

This release introduces key enhancements for Llama-3.1 models, optimizing both inference and training:

TP8 (8-way Tensor Parallelism):
- Enabled for Llama-3.1-8b and Llama-3.1-70b models, improves inference performance and efficiency.
- Includes O1HD inference, available on SN40 (not supported on SN30) and limited to TP8 configurations, delivering additional performance gains for supported setups.
TP4 (4-way Tensor Parallelism):
- Supported for the training versions of Llama-3.1-8b and Llama-3.1-70b models, ensuring efficient training workflows.

TP stands for Tensor Parallelism. See tensor parallel for more information.

Updated Model Cards

All compatible batch size, sequence lengths, TP, and DP configs for each model are listed in the respective Model Cards.

Deprecation

Removed validator feature

The validator feature has been removed from Model Zoo to improve user experience. This change allows you to explore configurations with greater flexibility.

Bugs fixed

There are no bug fixes in this release.

Known issues and limitations

The 16-way Data Parallel (DP) enablement feature works fine with 4K sequence lengths. However, it has a risk of running out of memory for 8K and higher sequence lengths.

Model Zoo 0.2.0

These release notes summarize the new features, improvements, and known issues for the 0.2.0 release of SambaNova Model Zoo.

Overview

This release introduces new features in the Model Zoo, expanding its capabilities in terms of model support, workflows, and performance.

New features

Data Parallel capabilities

In this release, we highlight the Data Parallel capabilities of various models.

Models with 4-way Data Parallel (DP) enablement (with TP 4)
- Llama-2-70b
Models with 8-way Data Parallel (DP) enablement (with TP 1)
- Gemma-7b
Models with 16-way Data Parallel (DP) enablement
- Llama-2-7b
- Llama-2-13b
- Mistral-7b

See data parallel for more information.

Inference summary to output a text file

The inference process has been updated to include a new feature that outputs a summary.txt file. This enhancement provides users with an easily accessible, detailed log of inference results, including accuracy metrics, prediction details, and execution time.

Bugs fixed

There are no bug fixes in this release.

Known issues

The CPU app is intended to be used as an example, it is tested only with Llama-2-7b. It is not guaranteed to work outside of the DevBox environment in typical user workflows.
The 16-way Data Parallel (DP) enablement feature works fine with 4K sequence lengths. However, it has a risk of running out of memory for 8K and higher sequence lengths.

Model Zoo 0.1.0

This is the first official release of SambaNova Model Zoo, which is currently in Beta. Here is a glimpse of Model Zoo features and limitations.

Overview

Model Zoo is available in a new public GitHub repository, SambaNova Model Zoo . The repository contains RDU-compatible model source code for popular open source models, along with libraries and example apps for efficiently compiling and running the models on SambaNova hardware.
SambaNova customers can download a container image (Devbox) that includes the SambaFlow compiler, other SambaNova libraries and all prerequisite software dependencies.
- Existing SambaNova customers can contact their Customer Support representative for access to the Devbox.
- If you’re new to SambaNova and interested in trying out Model Zoo, contact us at [email protected] to get started.

Details

The example apps support and demonstrate training and fine-tuning, evaluation, and text generation (inference) workflows on RDU.
Model Zoo supports NLP models that are based on a transformer architecture. Models available in the GitHub repo include:
- Llama-2 7B, 13B, 70B
- Llama-3 8B
- Gemma 7B
- Mistral 7B
You can use the Model Zoo source code and example apps with Hugging Face checkpoints that use bf16 or fp32 precision. For an example, see the walkthrough in our public GitHub repo.
The repo includes example apps for running training and inference on CPU so you can compare the RDU workflow with the CPU workflow. The CPU apps are primarily meant to illustrate differences and similarities, they have been tested only with Llama-2 7B.
You can customize model parameters and make other changes to the model. See Making changes to Model Zoo models.
You can further customize the RDU-compatible model source code within the constraints of supported PyTorch operators on RDUs. See Making changes to Model Zoo models.
When you run training or inference, the output includes a summary report at the end of each file that logs key metrics.
Model Zoo includes a validator that sends error messages if you are using a configuration that was not previously tested by SambaNova. You can set validate_config=False and proceed your own discretion. Model Zoo Best Practices explains what SambaNova tested.
Model Zoo supports advanced performance enhancing capabilities such as data parallel and tensor parallel, in preview mode with limited functionality.

Known issues and limitations

This Beta release of SambaNova Model Zoo allows you to run popular open source models on RDU. You can fine tune any model with your own data, and make other changes to configuration and source code. However, this Beta release doesn’t yet include all the performance knobs required for high-performance production training or inference. We recommend that you use this release primarily for any model experimentations.
Running in data parallel or tensor parallel mode is supported in Preview mode with limited functionality.
- Data parallel. Due to the constraint of fixed host memory on each node, data parallel can run only up to 4 sockets for models with similar size to Llama-2 7B without running out of memory. The larger the model, the fewer number of replicas you can run. The out of memory issue happens during checkpoint loading when each worker loads its own checkpoint simultaneously. We are actively working on a sharded checkpoint loading API that will avoid this issue.
- Tensor parallel. In this release of Model Zoo, we support tensor parallel only with Llama-2 70B. For that model, tensor parallel is required.
  
  See How to use data parallel mode and How to use tensor parallel mode (Beta) for some background.
We have tested each model included in this release with a set of configuration parameter combinations. If you run with different parameters, the validator signals an error. You can set validate_config=False and continue experimenting at your own discretion. See Model Zoo Best Practices for details.
You can load checkpoints that are pretrained or finetuned with the training app into the inference app. Ensure that the inference app has the same model config as the training app that generated the checkpoint. Otherwise, the inference app’s model config takes precedence, and accuracy issues result.

Caveats

This is a Beta release. You may encounter bugs or limitations. To report issues, open a support case via an email to [email protected] or via the support portal support.sambanova.ai.
We appreciate your patience and your feedback as we work towards a more polished experience.
Your input will help shape the final product. We’re grateful for your participation in this early stage of development.

Documentation

This doc set includes some conceptual background, best practices, and troubleshooting information for Model Zoo. Step-by-step instructions for container setup, running training and inference, etc. are in the GitHub repo.

The SambaNova Model Zoo GitHub repo includes:

A top-level README file that gives an overview and points to other README files.
A document with instructions for setup of the container environment and the Devbox container.
An /examples README with step-by-step instructions for running inference with a Hugging Face checkpoint, running fine-tuning with a dataset and a Hugging Face checkpoint, and running data parallel training.
README files for /text_generation and /training that include Quick Run commands and discussions of differences between RDU and CPU example apps and workflows.
A README in the /models directory that discusses each file included for each model.
A model card README for our Llama , Gemma , and Mistral implementations.

In addition, this doc set includes the following documents:

Model Zoo architecture and workflows. Explores the Model Zoo architecture and workflows.
Get started with Model Zoo. Gets you started. You learn about the architecture and about the steps for running a modified Llama model on RDU hardware.
Model Zoo best practices. Learn from the expert how to pass in arguments, make changes to the example apps, and examine information about a training run.
Model Zoo troubleshooting. Troubleshooting information for Model Zoo users.
The SambaFlow API Reference has details about the classes, methods, and operators used by Model Zoo. NOTE: In some cases, the code contains operators (e.g. gather and scatter) that map to a corresponding sn_* operator (e.g. sn_gather and sn_scatter).