Compiler argument reference
The SambaNova workflow includes a compilation step. Compilation generates a dataflow graph of the model, which is similar to a PyTorch computational graph, encapsulated as a PEF file. You submit the PEF file when you do training and inference runs. See Workflows.
This doc page is a reference to commonly used compiler arguments. You can experiment with most of these arguments yourself, but some are used only when you’re working with SambaNova Support.
See How model compilation works and Compiler optimization modes for background.
How to specify compilation arguments
The environment you’re working in determines how you specify compilation arguments:
|
General compiler arguments
--pef-name <filename>
Name of the PEF file that the compiler generates and of the subdirectory for other compilation artifacts. By default the compiler uses the compilation timestamp and the process ID to name the subdirectory.
Use this argument to give your PEF file a meaningful name.
For example, use today’s date in the name or use an inf
extension for compilation for inference. If you experiment with different sets of hyperparameters, consider using them in the name. For example, if you are changing batch sizes in different training runs, add -b32
or -b16
to the PEF name to note the batch. For example:
python $HOME/sambaflow-apps/starters/logreg.py compile --pef-name logreg-0923
--inference
Specify --inference
to compile and generate a PEF for inference. If you specify this argument, the compiler performs only a forward pass and doesn’t perform certain optimizations. See How model compilation works.
Just as with compilation for a training run, you can specify a PEF file and other compiler arguments. For example:
$ python lenet.py compile --inference --pef-name=lenet-compile
--output-folder <folder-name>
Optional output folder. The compiler places the PEF file and log files in the output folder, which defaults to ./out/<pef-name>
. To set the folder explicitly, run a command like this:
python lenet.py compile --pef-name=lenet --output-folder=out_test7
-b <size>, --batch-size <size>
Informs the compiler which batch size will later be used during training.
Set batch-size
to 4, 8, 16, 32, or even higher to support more efficient training.
The highest value you can use depends on the model and on available hardware resources.
With different batch sizes your training might go faster or slower (to achieve the target accuracy). It usually takes some experimentation to find the right batch size for a model.
python lenet.py compile --pef-name=lenet-1023 --batch-size 4
Log management arguments
You cannot currently set the log level when running the compile command. You can only switch verbose or debug logging on or off. |
-v, --verbose
Shows verbose log output, similar to the --debug
output.
When you run the compiler with --verbose
, specify an output directory. The information about the location of the generated PEF file is no longer at the end of the output. For example:
python logreg.py compile --pef-name=logreg-1023 --output-folder=out-1023 -v
--debug
When you work on a model with SambaNova Support, they might ask you to start the model in debug mode. In debug mode, the compiler sends more messages to stdout (and the logs). In addition, the --help
output shows some arguments that are customarily used only with customer support.
python logreg.py compile --pef-name=logreg-1023 --debug
Compiler optimization arguments
You can specify an optimization level for the compiler. SambaFlow compiler overview explains the effect of each level.
-o0
When you specify -o0
, each PyTorch operator is compiled independently.
This is the safest option, but because the compiler doesn’t perform optimizations, training and inference take longer than with o3. See Compiler optimization modes for details.
Here’s a very simple example. We’re working on additional examples.
python lenet.py compile --pef-name=lenet-1123 -o0
-o1
With -o1
, PyTorch operators are fused together into subgraphs. Each subgraph is compiled independently.
See Compiler optimization modes for details.
If you don’t specify --optimization-rules , the compiler behavior is the same as with -o0 .
|
python logreg.py --pef-name=lenet-1123 compile -o1 --optimization-rules /opt/sambaflow/apps/nlp/my-custom-rule.yaml
--optimization-rules <path-to-rules-file>.yaml
Optimization rules .yaml file to use with o1.
-
See Compiler optimization modes for details.
-
See Operator fusion rule yaml syntax for a reference to the Beta release of the rules file.
--o1-experimental-opts
Enable compiler optimizations with o1. This argument is currently required when you compile with o1. We expect to remove this argument soon.
-o3
With -o3
, the compiler has a global view of the entire graph. With this release (1.17), -o3
is the default.
This option usually has the longest compile time but fast runtime performance when used with model-specific HD files. Because the compiler attempts to optimize the whole graph, compilation might fail in some cases.
With -o3, you can optionally annotate subgraphs with `--enable-hypersection
. In that case, each annotated subgraph is compiled independently. If there are duplicate subgraphs, only one is compiled and reused.
--compiler-mode <name>
Specifies the compiler mode. Using this flag with the right model type improves performance.
nlp is currently the only supported option.
|
--enable-hypersection
If you’re running in o3 mode (the default), then you can annotate your model’s Python code to tell the compiler about duplicate subgraphs and get the performance improvements. Use this option to enable this optimization.
This option is used only conjunction with o3 compiler mode and usually when working with SambaNova support. In the future, expect to use o1 mode and operator fusion rule yaml files. |
--resources-scaling-factors <factors>
Sometimes the compiler underestimates or overestimates the RDU resources that are needed for some decisions. Overestimation can results in compilation failures and underestimation can result in bad performance. If compilation fails, you can use this flag to force the compiler to assume it has fewer resources available than it has.
Specify 3 or 4 floats. A float of 1.0 means that the compiler can see all avaiiable resources.
-
Three floats: scaling factor for forward, backward and optimizer graphs
-
Four floats: scaling factor for forward, backward, gradient normalization and optimizer graphs
For example:
python lenet.py compile --pef-name=lenet1223 --resources-scaling-factors 1 0.8 0.8
The compiler assumes that it can use all available resources for forward graphs and 80% for backward and optimizer graphs.
Hardware configuration arguments
--arch [native|sn10|sn20|sn30]
Allows you to compile for a different target architecture. For example, if you’re compiling on an SN30 system but expect to run the model on an SN20 system, you can use this flag.
Default is native, that is, the compiler targets the architecture of the hardware that you’re running on.
The options are sn20 , sn30 etc. You cannot use SN20 , SN30 etc.
|
python logreg.py compile --pef-name=logreg-0923 --arch=sn20
Performs compilation so the PEF runs on an SN20 system even if you’re compiling on an SN30 system or on a CPU-only node.
Tensor parallel arguments
The following argument let you control tensor parallel behavior. See How to use tensor parallel mode (Beta) for details, including an example of an operator fusion yaml file to use together with tensor parallel.
--tensor-parallel batch|weight
Instructs the compiler to run in tensor parallel mode.
-
Batch mode splits tensors on the batch dimension. If the data tensors are larger than weight tensors, then batch mode has better performance.
-
Weight mode splits tensors on the dimension of the weight tensor. If the weight tensors are larger than data tensors, weight mode has better performance.
Data parallel arguments
--data-parallel
Causes the compiler to add the gather and reduce sections and buffers to the dataflow graph to support data parallel operation. See How to use data parallel mode for some prerequisites and best practices.
python logreg.py compile --data-parallel -ws 2 --pef-name=logreg-1223
--world-size <integer>, -ws <integer>
Defines the minimum number of application replicas to be launched when the model is trained in data parallel mode. For compilation, set the value to 2. The actual number of replicas to be launched is defined at runtime.
python logreg.py compile --data-parallel -ws 2 --pef-name=logreg-1223
For use with customer support
The following options are included in the compile --help
output by default, but are reserved for use with SambaNova Support.
--compiler-configs-file COMPILER_CONFIGS_FILE
--mac-human-decision MAC_HUMAN_DECISION
--grad-accumulation-steps GRAD_ACCUMULATION_STEPS
--num-spatial-batches NUM_SPATIAL_BATCHES
--model-parallel
(requires a human decision file)
--n-chips <integer>
(use only with model-parallel
, which is for use with Customer Support only)
Deprecated argument
The following argument is deprecated and will not be supported in future releases.
-
--o1-rule-list <yaml-file>. Starting with 1.17, this argument and related options are deprecated. Use the new
--optimization-rules
argument, discussed in Compiler optimization modes instead.