Model Zoo best practices
How to pass in arguments
Model Zoo uses the Hydra framework for managing app-level parameters. You manage arguments through the YAML config file and on the command line. Our example YAML files have detailed comments. In most cases, you only specify input and output paths and use the default for other parameters.
-
The YAML file includes a set of default parameters for compiling and running a model. You can add and modify parameters in the YAML file.
-
When you run Python scripts for an example app to compile your model:
-
The app uses parameters and their values from the YAML file. See the text generation /config folder for some examples
-
If a parameter doesn’t have a value in the YAML file, you have to specify it on the command line.
-
You can override values that are specified in the YAML file on the command line.
-
If you want to specify an argument that is not in the YAML file at all, you precede it with a
+
sign. -
For more information on overriding arguments, refer to https://hydra.cc/docs/advanced/override_grammar/basic/ .
-
In the following example, the command, model checkpoint path, and batch size are in the YAML file but we want to explicitly specify a value. The target SambaFlow version is not specified in the yaml so it should be added as an additional argument. Optionally, you may also supply a custom name for the PEF:
python rdu_generate_text.py \
command=compile \
checkpoint.model_name_or_path=PATH_TO_DOWNLOADED_MODEL \
+samba_compile.target_sambaflow_version=MAJOR.MINOR.PATCH
+samba_compile.pef_name=mypef
Recommended checkpoints
This section lists all recommended checkpoints.
Model Zoo is not limited to just these checkpoints, but is compatible with any bfloat16 or float32 precision Hugging Face checkpoint. Try Model Zoo compatible checkpoints, or use the list below as a starting point for Model Zoo.
For quality of text generation, it is recommended to use the chat version of the checkpoint if it is available.
Making changes to Model Zoo models
You can experiment with Model Zoo parameters and make some changes to Model Zoo source code. In summary:
Change to | Comment | Recompile? | See |
---|---|---|---|
|
Model-specific parameters such as |
Usually No. Some exceptions. |
|
|
Configuration associated with a checkpoint and downloaded when you download the checkpoint. |
Yes |
|
Source code |
Source code that has been customized to work efficiently on RDU is included in the modelzoo repo. Experiment, for example by using a different operator supported by SambaFlow. |
Yes |
Making changes to base_config.yaml parameters
Model Zoo uses Hydra and Pydantic for argument management (see How to pass in arguments). That means you can either set SambaNova specific parameter values in the base_config.yaml, or set them on the command line.
The following parameter value ranges are recommended. You can experiment with other parameter values.
For information about each of the parameters, see the commented base_comfig_*.yaml file in the public GitHub repo, for example, base_config_rdu.yam. |
model:
use_segmented_softmax_attn: [false, true]
max_seq_length: [4096, 8192]
samba_compile:
tensor_parallel: [none]
n_chips: [1]
run_early_tp: [false]
generation:
batch_size: [1, 2, 4, 8] #needs recompile
All models were also tested with the default values in the base_config.yaml
file.
For Llama 70B, base configuration models you must use Tensor Parallel mode to ensure the model fits on the RDU. Use these settings in the samba_compile: tensor_parallel: weight n_chips: 2 num_tiles: 8 early_tp: true |
Making changes to config.json parameters
You download a config.json
for the model when you download the Hugging Face checkpoint. If the changes to the base_config.yaml aren’t comprehensive enough for what you intend to do, you can experiment with making changes to config.json parameter.
Any changes to the config.json require a recompile.
|
We support all the base JSON configurations associated with recommended checkpoints (see Recommended checkpoints). If you use other parameter values, our validator displays an error.
See the Model Zoo Troubleshooting document for details on how to experiment with parameters we have not yet tested. Let us know which changes resulted in improvements, or if changes result in a failure to compile.
Making changes to source code
Model Zoo includes source code that has been customized to work efficiently on RDU. You can experiment with changes to the source code. In particular, consider exploring using a different operator in certain situations. Supported SambaFlow operators are in the SambaFlow API Reference .
A change to the source code always requires a recompile. |
Let us know which changes resulted in improvements, or which changes result in a failure to compile.
Example: Changing the attention module
Here’s a simple example for making source code changes. The actual file paths depend in part on how you’re defining certain directories in your environment.
-
Make the following changes to the source files:
-
$HOME/sambanova_modelzoo/modelzoo/models/llama/modeling_llama.py
self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=False) self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False) self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False) self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False) + self.o_proj_2 = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
-
$HOME/sambanova_modelzoo/modelzoo/models/llama/patch_llama.py
if self.pretraining_tp > 1: attn_output = attn_output.split(self.hidden_size // self.pretraining_tp, dim=2) o_proj_slices = self.o_proj.weight.split(self.hidden_size // self.pretraining_tp, dim=1) attn_output = sum([F.linear(attn_output[i], o_proj_slices[i]) for i in range(self.pretraining_tp)]) else: attn_output = self.o_proj(attn_output) + attn_output = self.o_proj_2(attn_output)
-
-
Compile to generate a new PEF with these changes:
cd /opt/modelzoo/example/nlp/text_generation/ python rdu_generate_text.py command=compile checkpoint.model_name_or_path=/opt/ckpt_llama7b/fp32/ samba_compile.output_folder=/opt/out/ +samba_compile.pef_name=llama7b_Source_change +samba_compile.target_sambaflow_version=1.19.1
-
Verify that the changes are reflected in the PEF:
PEF_PATH=/opt/out/llama2_7b_infer_proj/llama2_7b_infer_proj.pef /opt/sambanova/bin/python -c "import os; from pypefapi import PyPefApi; pef = PyPefApi(os.environ.get('PEF_PATH')); print(pef.pypef.metadata)"
In the metadata that is sent to stdout, you should see the additional
Linear
operator:(o_proj_2): Linear(in_features=4096, out_features=4096, bias=False)
-
You can now run the model with a command like the following:
python rdu_generate_text.py \ command=run \ checkpoint.model_name_or_path=/opt/ckpt_llama7b/fp32/ \ samba_run.pef=/opt/out/llama7b_Schange/llama7b_Schange.pef
Information about model runs
After a model run, you can look at information about the run.
Training
When you run training, the app generates summary.txt
and per_step_metrics.csv
files.
The summary.txt
file includes information like the following about the model run:
Number of epochs: 1 Per worker batch size: 2 Per worker number of batches (steps): 2 Number of DP workers: 2 Total tokens seen: 4914 Tokens per second: 120.8163 Average time per step: 20.3309s The following are the model params used to train this model using Model Zoo:{"fp32_ln":false,"fp32_logits":true,"fp32_skip_add":true,"mixedp_attn":true,"max_seq_length":4096,"use_plugin_heuristics":false,"use_segmented_softmax_attn":false}
-
The
per_step_metrics.csv
file includes information like the following:Tokens in Step,Step Loss,Learning Rate,Time per Step tensor(2691),tensor(0.9211),1e-05,20.194304943084717 tensor(2223),tensor(0.2960),1e-05,20.467589616775513
Generation (inference)
At the end of a text generation run, the app saves a checkpoint and outputs some basic telemetry and performance metrics to a summary.txt file with information like the following:
latencies time to first token 1.2131s tokens, excluding first token 0.3460s tokens, overall 0.3731s Total Latency 1.5592s throughputs tokens/second excluding first token 2.8899 tokens/second overall 2.6800