Arguments for compile

Your SambaFlow model starts as a Python application (see Hello SambaFlow! for an example). When you compile your model, you can pass in compiler flags as arguments.

You can compile for training, the default.
You can compile for inference by specifying --inference. When you do, the compiler doesn’t perform certain optimizations. See How model compilation works.

This doc page is a reference to commonly used compiler arguments. You can experiment with most of these arguments yourself, but some are used only when you’re working with SambaNova Support.

The --help output includes commonly used arguments by default. If your model includes the dev_mode=TRUE argument or if you pass in --debug from the command line, the --help output includes experimental arguments that are currently supported but are subject to change without notice.

General compiler arguments

pef-name

--pef-name (<string>)

Description

Defines the name of the PEF file and the subdirectory for other compilation artifacts.

By default the script uses the compilation timestamp and the process ID to name this subdirectory.
You can use this parameter to give your PEF files more meaningful names. For example, use today’s date in the name, or use an inf extension for compilation for inference. If you experiment with different sets of hyperparamenters, consider using them in the name. For example, if you are changing batch sizes in different training runs add -b32 or -b16 to the PEF name to note the batch.

Example

python $HOME/sambaflow-apps/starters/logreg.py compile --pef-name logreg-0923

inference

--inference (boolean)

Description

Specify --inference to compile and generate a PEF for inference, which doesn’t perform certain optimizations. See How model compilation works.

Just as in compilation for a training run, you can specify a PEF file and other compiler arguments.

Example

$ python lenet.py compile --inference --pef-name=lenet-compile

output-folder

--output-folder (string)

Description

Optional output folder of the compilation. The compiler places the PEF file and miscellaneous log files in the output directory. Defaults to ./out/<pef-name> (a subdirectory called out in the current directory with the value of pef-name attached to it).

The custom output folder you specify must exist before you run the command.

Example

python lenet.py compile --pef-name=lenet --output-folder=out_test7

batch-size

-b <BATCH_SIZE> --batch-size <BATCH_SIZE>

Description

Informs the compiler which batch size will later be used during training. Set batch-size to 4, 8, 16, 32, or even higher to support more efficient training. The highest value you can use depends on the model and on available hardware resources. With different batch sizes your training might go faster or slower (to achieve the target accuracy). Because this is a hyperparameter, expect that it takes some experimentation to find the right batch size for a particular model.

Example

python lenet.py compile --pef-name=lenet-1023 --batch-size 4

Log management arguments

verbose

--verbose (boolean) -v (boolean)

Description

Shows verbose logging. The output is similar to what you get when you use the --debug argument.

When you run the compiler with --verbose, the information about the location of the generated PEF file is no longer at the end of the output, so it’s a good idea to specify an output directory.

You cannot currently set the log level when running the compile command. You can only switch verbose or debug logging on or off.

Example

python logreg.py compile --pef-name=logreg-1023 --output-folder=out-1023 -v

debug

--debug (boolean)

Description

When you work on a model with SambaNova Support, they might ask you to start the model in debug mode. In debug mode, the compiler sends more messages to stdout (and the logs). In addition, the --help output shows some additional arguments that are customarily use with customer support.

You cannot currently set the log level when running the compile command. You can only switch verbose or debug logging on or off.

Example

python logreg.py compile --pef-name=logreg-1023 --debug

Enables debug logging and experimental compiler options.

log-dir

--log-dir (string)

Description

Specify a non-default directory where all logs with warnings are sent.

Only log files with warnings are sent to log-dir. All other logs are still in the output folder.

Example

python logreg.py compile --pef-name=logreg-1023 --log-dir=logs-1023

Compiler optimization arguments

o0, o1, o3

-o0 (boolean)
-o1 (boolean)
-o3 (booean) (default)

Specify the optimization level for the compiler. Defaults to o3 with release 1.16.

Description

Specify an optimization level for the compiler.

o0 compiler argument

When you specify the o0 argument, each PyTorch operator is compiled independently.

This is the safest option, but because the compiler doesn’t perform optimizations, training and inference take longer than o3.

o0 examples

Here’s a very simple example. We’re working on additional examples.

python lenet.py compile --pef-name=lenet-1123 -o0

-o1 compiler argument

When you specify the o1 argument, PyTorch operators are fused together into subgraphs. Each subgraph is compiled independently.

-o1 examples

Here’s a very simple example. We’re working on additional examples that use o1 with --o1-rule-list.

python logreg.py --pef-name=lenet-1123 compile -o1

-o3 compiler argument

o3 means that the compiler has a global view of the entire graph. With this release (1.16), o3 is the default.

This option usually has the longest compile time but fastest runtime performance. Because the compiler makes so many decisions itself and attempts optimization, compilation might fail in some cases.

Optionally, you can annotate subgraphs with --enable-hypersection. In that case, each annotated subgraph is compiled independently. If there are duplicate subgraphs, only one is compiled and reused.

o1-experimental-opts

--o1-experimental-opts(boolean)

Enable compiler optimizations with o1.

Description

Optional flag that enables compiler optimizations that likely increase runtime performance. Not extensively tested yet.

o1-rule-list

--o1-rule-list(string)

Description

The compiler determines which operators belong to a subgraph based on a fusion rule list that SambaNova developed. The fusion rule list is set with --o1-rule-list1. The flags to use in the rule list file depend entirely on the model.

For example, to use -o1 with a gpt2 rule list, add the following flags during compilation: -o1 --o1-rule-list=gpt2.

SambaNova supports the following fusion rule lists:

Rule list Description

Rule list	Description
`default`	Used if no rule list is specified. Contains fusion rules for MHA modules
`nlp`	Contains fusion rules for common modules used in NLP models. For example: MHA modules, Embedding modules, QKV modules, etc.
`gpt2`	Contains fusion rules developed specifically for GPT-2 models. Including embedding, MHA, QKV, FFN0, FFN1, ProjGemm, AttentionMask, CrossEntropy and Classification.
`bloom`	Contains fusion rules developed specifically for BLOOM models. Similar to gpt2, but some nodes are adjusted based on modules in BLOOM.
`gpt_neox_inference`	Contains fusion rules developed specifically for GPT-NeoX inference. Similar to gpt2, but some nodes are adjusted based on modules in GPT-NeoX inference.
`gpt-neox-training`	Contains fusion rules developed specifically for GPT-NeoX training. Similar to gpt2, but some nodes are adjusted based on modules in GPT-NeoX training.

default

Used if no rule list is specified. Contains fusion rules for MHA modules

nlp

Contains fusion rules for common modules used in NLP models. For example: MHA modules, Embedding modules, QKV modules, etc.

gpt2

Contains fusion rules developed specifically for GPT-2 models. Including embedding, MHA, QKV, FFN0, FFN1, ProjGemm, AttentionMask, CrossEntropy and Classification.

bloom

Contains fusion rules developed specifically for BLOOM models. Similar to gpt2, but some nodes are adjusted based on modules in BLOOM.

gpt_neox_inference

Contains fusion rules developed specifically for GPT-NeoX inference. Similar to gpt2, but some nodes are adjusted based on modules in GPT-NeoX inference.

gpt-neox-training

Contains fusion rules developed specifically for GPT-NeoX training. Similar to gpt2, but some nodes are adjusted based on modules in GPT-NeoX training.

compiler-mode

--compiler-mode (string)

Specify the compiler mode.

Description

Enables compiler optimizations that can reduce compilation time when used with NLP-like models. Using this flag might affect performance.

nlp is the only supported option right now. Other options (e.g. for vision models) are planned.

enable-hypersection

--enable-hypersection (string)

Enable hypersection optimizations in o3 compiler mode.

Description

By default, hypersections are enabled in o0 and o1 mode. If a graph has duplicate subgraphs, the compiler compiles the subgraph only once. The result is improved performance.

If you’re running in o3 mode, the default, you can annotate your model’s Python code to tell the compiler about duplicate subgraphs and get the performance improvements.

resources-scaling-factors

--resources-scaling-factors(3 or 4 floats)

Description

Sometimes the compiler underestimates or overestimates the RDU resources that are needed for some decisions. Overestimation can results in compilation failures, and underestimation can result in bad performance. If compilation fails, you can use this flag to force the compiler to assume it has fewer resources available than it has.

Specify 3 or 4 floats. A float of 1.0 means that the compiler can see all avaiiable resources.

Three floats: scaling factor for forward, backward, and optimizer graphs
Four floats: scaling factor for forward, backward, gradient normalization, and optimizer graphs

Example

python lenet.py compile --pef-name=lenet1223 --resources-scaling-factors 1 0.8 0.8

The compiler assumes that it can use all available resources for forward graphs, and 80% for backward and optimizer graphs.

mac-v2

--mac-v2(boolean)

Description

By default, the compiler uses the new compiler architecture. Use this option if you have a legacy model that relies on the earlier version of the compiler, and you see suboptimal compilation.

Example

python logreg.py compile --pef-name=1223 --mac-v2

Uses the legacy compiler with your legacy model.

Hardware configuration arguments

arch

--arch(native|sn10|sn20|sn30)

Description

Allows you to compile with a different target architecture. For example, if you’re running on an SN30 system but expect to run the model on an SN20 system, you can use this flag.

Default is native, that is, the compiler targets the architecture of the hardware that you’re running on.

The options are sn20, sn30 etc. You cannot use SN20, SN30 etc.

Example

python logreg.py compile --pef-name=logreg-0923 --arch=sn20

Performs compilation so the PEF runs on an SN20 system even if you’re compiling on an SN30 system or on a CPU-only node.

n-chips

--n-chips (integer)

Description

Allows you to specify the number of chips you want to use (from 1 to 8). Default is 1 chip (4 tiles).

Example

python logreg.py compile --pef-name=logreg-0923 --n-chips=2

Parallelism management

data-parallel

--data-parallel (boolean)

Description

Causes the compiler to add the gather and reduce sections and buffers to the dataflow graph to support data parallel operation. See Data parallel applications in SambaFlow for some prerequisites and best practices.

Example

python logreg.py compile --data-parallel -ws 2 --pef-name=logreg-1223

world-size

--world-size (integer) -ws (integer)

Description

Defines the minimum number of application replicas to be launched when the model is trained in data parallel mode. For compilation, set the value to 2. The actual number of replicas to be launched is defined at runtime.

Example

python logreg.py compile --data-parallel -ws 2 --pef-name=logreg-1223

For use with Customer Support

The following options are included in the compile --help output by default, but are reserved for use with SambaNova Support.

--compiler-configs-file COMPILER_CONFIGS_FILE

--mac-human-decision MAC_HUMAN_DECISION

--grad-accumulation-steps GRAD_ACCUMULATION_STEPS

--num-spatial-batches NUM_SPATIAL_BATCHES

--model-parallel (requires a human decision file)