Arguments for compile
Your SambaFlow model starts as a Python application (see Hello SambaFlow! for an example). When you compile your model, you can pass in compiler flags as arguments.
-
You can compile for training, the default.
-
You can compile for inference by specifying
--inference
. When you do, the compiler doesn’t perform certain optimizations. See How model compilation works.
This doc page is a reference to commonly used compiler arguments. You can experiment with most of these arguments yourself, but some are used only when you’re working with SambaNova Support.
The --help output includes commonly used arguments by default. If your model includes the dev_mode=TRUE argument or if you pass in --debug from the command line, the --help output includes experimental arguments that are currently supported but are subject to change without notice.
|
General compiler arguments
pef-name
--pef-name (<string>)
Description
Defines the name of the PEF file and the subdirectory for other compilation artifacts.
-
By default the script uses the compilation timestamp and the process ID to name this subdirectory.
-
You can use this parameter to give your PEF files more meaningful names. For example, use today’s date in the name, or use an
inf
extension for compilation for inference. If you experiment with different sets of hyperparamenters, consider using them in the name. For example, if you are changing batch sizes in different training runs add-b32
or-b16
to the PEF name to note the batch.
inference
--inference (boolean)
Description
Specify --inference
to compile and generate a PEF for inference, which doesn’t perform certain optimizations. See How model compilation works.
Just as in compilation for a training run, you can specify a PEF file and other compiler arguments.
output-folder
--output-folder (string)
Description
Optional output folder of the compilation. The compiler places the PEF file and miscellaneous log files in the output directory. Defaults to ./out/<pef-name> (a subdirectory called out in the current directory with the value of pef-name attached to it).
The custom output folder you specify must exist before you run the command.
batch-size
-b <BATCH_SIZE>
--batch-size <BATCH_SIZE>
Description
Informs the compiler which batch size will later be used during training. Set batch-size to 4, 8, 16, 32, or even higher to support more efficient training. The highest value you can use depends on the model and on available hardware resources. With different batch sizes your training might go faster or slower (to achieve the target accuracy). Because this is a hyperparameter, expect that it takes some experimentation to find the right batch size for a particular model.
Log management arguments
verbose
--verbose (boolean)
-v (boolean)
Description
Shows verbose logging. The output is similar to what you get when you use the --debug
argument.
When you run the compiler with --verbose, the information about the location of the generated PEF file is no longer at the end of the output, so it’s a good idea to specify an output directory.
You cannot currently set the log level when running the compile command. You can only switch verbose or debug logging on or off. |
debug
--debug (boolean)
Description
When you work on a model with SambaNova Support, they might ask you to start the model in debug mode. In debug mode, the compiler sends more messages to stdout (and the logs). In addition, the --help
output shows some additional arguments that are customarily use with customer support.
You cannot currently set the log level when running the compile command. You can only switch verbose or debug logging on or off. |
Compiler optimization arguments
o0, o1, o3
-o0 (boolean) -o1 (boolean) -o3 (booean) (default)
Specify the optimization level for the compiler. Defaults to o3 with release 1.16.
Description
Specify an optimization level for the compiler.
o0 compiler argument
When you specify the o0 argument, each PyTorch operator is compiled independently.
This is the safest option, but because the compiler doesn’t perform optimizations, training and inference take longer than o3.
o0 examples
Here’s a very simple example. We’re working on additional examples.
python lenet.py compile --pef-name=lenet-1123 -o0
-o1 compiler argument
When you specify the o1 argument, PyTorch operators are fused together into subgraphs. Each subgraph is compiled independently.
-o1 examples
Here’s a very simple example. We’re working on additional examples that use o1
with --o1-rule-list
.
python logreg.py --pef-name=lenet-1123 compile -o1
-o3 compiler argument
o3 means that the compiler has a global view of the entire graph. With this release (1.16), o3
is the default.
This option usually has the longest compile time but fastest runtime performance. Because the compiler makes so many decisions itself and attempts optimization, compilation might fail in some cases.
Optionally, you can annotate subgraphs with --enable-hypersection
. In that case, each annotated subgraph is compiled independently. If there are duplicate subgraphs, only one is compiled and reused.
o1-rule-list
--o1-rule-list(string)
Description
The compiler determines which operators belong to a subgraph based on a fusion rule list that SambaNova developed. The fusion rule list is set with --o1-rule-list1
. The flags to use in the rule list file depend entirely on the model.
For example, to use -o1 with a gpt2 rule list, add the following flags during compilation: -o1 --o1-rule-list=gpt2
.
SambaNova supports the following fusion rule lists:
Rule list | Description |
---|---|
|
Used if no rule list is specified. Contains fusion rules for MHA modules |
|
Contains fusion rules for common modules used in NLP models. For example: MHA modules, Embedding modules, QKV modules, etc. |
|
Contains fusion rules developed specifically for GPT-2 models. Including embedding, MHA, QKV, FFN0, FFN1, ProjGemm, AttentionMask, CrossEntropy and Classification. |
|
Contains fusion rules developed specifically for BLOOM models. Similar to gpt2, but some nodes are adjusted based on modules in BLOOM. |
|
Contains fusion rules developed specifically for GPT-NeoX inference. Similar to gpt2, but some nodes are adjusted based on modules in GPT-NeoX inference. |
|
Contains fusion rules developed specifically for GPT-NeoX training. Similar to gpt2, but some nodes are adjusted based on modules in GPT-NeoX training. |
enable-hypersection
--enable-hypersection (string)
Enable hypersection optimizations in o3 compiler mode.
Description
By default, hypersections are enabled in o0 and o1 mode. If a graph has duplicate subgraphs, the compiler compiles the subgraph only once. The result is improved performance.
If you’re running in o3 mode, the default, you can annotate your model’s Python code to tell the compiler about duplicate subgraphs and get the performance improvements.
resources-scaling-factors
--resources-scaling-factors(3 or 4 floats)
Description
Sometimes the compiler underestimates or overestimates the RDU resources that are needed for some decisions. Overestimation can results in compilation failures, and underestimation can result in bad performance. If compilation fails, you can use this flag to force the compiler to assume it has fewer resources available than it has.
Specify 3 or 4 floats. A float of 1.0 means that the compiler can see all avaiiable resources.
-
Three floats: scaling factor for forward, backward, and optimizer graphs
-
Four floats: scaling factor for forward, backward, gradient normalization, and optimizer graphs
Hardware configuration arguments
arch
--arch(native|sn10|sn20|sn30)
Description
Allows you to compile with a different target architecture. For example, if you’re running on an SN30 system but expect to run the model on an SN20 system, you can use this flag.
Default is native, that is, the compiler targets the architecture of the hardware that you’re running on.
The options are sn20 , sn30 etc. You cannot use SN20 , SN30 etc.
|
Parallelism management
data-parallel
--data-parallel (boolean)
Description
Causes the compiler to add the gather and reduce sections and buffers to the dataflow graph to support data parallel operation. See Data parallel applications in SambaFlow for some prerequisites and best practices.
For use with Customer Support
The following options are included in the compile --help
output by default, but are reserved for use with SambaNova Support.
--compiler-configs-file COMPILER_CONFIGS_FILE
--mac-human-decision MAC_HUMAN_DECISION
--grad-accumulation-steps GRAD_ACCUMULATION_STEPS
--num-spatial-batches NUM_SPATIAL_BATCHES
--model-parallel
(requires a human decision file)