Compiler argument reference
The SambaNova workflow includes a compilation step. Compilation generates a dataflow graph of the model, which is similar to a PyTorch computational graph, encapsulated as a PEF file. You submit the PEF file when you do training and inference runs. See Workflows.
This doc page is a reference to commonly used compiler arguments. You can experiment with most of these arguments yourself, but some are used only when you’re working with SambaNova Support.
Name of the PEF file that the compiler generates and of the subdirectory for other compilation artifacts. By default the compiler uses the compilation timestamp and the process ID to name the subdirectory.
Use this argument to give your PEF file a meaningful name.
For example, use today’s date in the name or use an
inf extension for compilation for inference. If you experiment with different sets of hyperparameters, consider using them in the name. For example, if you are changing batch sizes in different training runs, add
to the PEF name to note the batch. For example:
python $HOME/sambaflow-apps/starters/logreg.py compile --pef-name logreg-0923
--inference to compile and generate a PEF for inference. If you specify this argument, the compiler performs only a forward pass and doesn’t perform certain optimizations. See How model compilation works.
Just as with compilation for a training run, you can specify a PEF file and other compiler arguments. For example:
$ python lenet.py compile --inference --pef-name=lenet-compile
Optional output folder. The compiler places the PEF file and log files in the output folder, which defaults to
./out/<pef-name>. To set the folder explicitly, run a command like this:
python lenet.py compile --pef-name=lenet --output-folder=out_test7
Informs the compiler which batch size will later be used during training.
batch-size to 4, 8, 16, 32, or even higher to support more efficient training.
The highest value you can use depends on the model and on available hardware resources.
With different batch sizes your training might go faster or slower (to achieve the target accuracy). It usually takes some experimentation to find the right batch size for a model.
python lenet.py compile --pef-name=lenet-1023 --batch-size 4
|You cannot currently set the log level when running the compile command. You can only switch verbose or debug logging on or off.|
Shows verbose log output, similar to the
When you run the compiler with
--verbose, specify an output directory. The information about the location of the generated PEF file is no longer at the end of the output. For example:
python logreg.py compile --pef-name=logreg-1023 --output-folder=out-1023 -v
When you work on a model with SambaNova Support, they might ask you to start the model in debug mode. In debug mode, the compiler sends more messages to stdout (and the logs). In addition, the
--help output shows some arguments that are customarily used only with customer support.
python logreg.py compile --pef-name=logreg-1023 --debug
You can specify an optimization level for the compiler. SambaFlow compiler overview explains the effect of each level.
When you specify
-o0, each PyTorch operator is compiled independently.
This is the safest option, but because the compiler doesn’t perform optimizations, training and inference take longer than with o3. See Compiler optimization modes for details.
Here’s a very simple example. We’re working on additional examples.
python lenet.py compile --pef-name=lenet-1123 -o0
-o1, PyTorch operators are fused together into subgraphs. Each subgraph is compiled independently.
See Compiler optimization modes for details.
If you don’t specify
python logreg.py --pef-name=lenet-1123 compile -o1 --optimization-rules /opt/sambaflow/apps/nlp/my-custom-rule.yaml
Optimization rules .yaml file to use with o1.
Enable compiler optimizations with o1. This argument is currently required when you compile with o1. We expect to remove this argument soon.
-o3, the compiler has a global view of the entire graph. With this release (1.17),
-o3 is the default.
This option usually has the longest compile time but fast runtime performance when used with model-specific HD files. Because the compiler attempts to optimize the whole graph, compilation might fail in some cases.
-o3, you can optionally annotate subgraphs with `--enable-hypersection. In that case, each annotated subgraph is compiled independently. If there are duplicate subgraphs, only one is compiled and reused.
Specifies the compiler mode. Using this flag with the right model type improves performance.
If you’re running in o3 mode (the default), then you can annotate your model’s Python code to tell the compiler about duplicate subgraphs and get the performance improvements. Use this option to enable this optimization.
|This option is used only conjunction with o3 compiler mode and usually when working with SambaNova support. In the future, expect to use o1 mode and operator fusion rule yaml files.|
Sometimes the compiler underestimates or overestimates the RDU resources that are needed for some decisions. Overestimation can results in compilation failures and underestimation can result in bad performance. If compilation fails, you can use this flag to force the compiler to assume it has fewer resources available than it has.
Specify 3 or 4 floats. A float of 1.0 means that the compiler can see all avaiiable resources.
Three floats: scaling factor for forward, backward and optimizer graphs
Four floats: scaling factor for forward, backward, gradient normalization and optimizer graphs
python lenet.py compile --pef-name=lenet1223 --resources-scaling-factors 1 0.8 0.8
The compiler assumes that it can use all available resources for forward graphs and 80% for backward and optimizer graphs.
Allows you to compile for a different target architecture. For example, if you’re compiling on an SN30 system but expect to run the model on an SN20 system, you can use this flag.
Default is native, that is, the compiler targets the architecture of the hardware that you’re running on.
The options are
python logreg.py compile --pef-name=logreg-0923 --arch=sn20
Performs compilation so the PEF runs on an SN20 system even if you’re compiling on an SN30 system or on a CPU-only node.
Causes the compiler to add the gather and reduce sections and buffers to the dataflow graph to support data parallel operation. See Data parallel applications in SambaFlow for some prerequisites and best practices.
python logreg.py compile --data-parallel -ws 2 --pef-name=logreg-1223
Defines the minimum number of application replicas to be launched when the model is trained in data parallel mode. For compilation, set the value to 2. The actual number of replicas to be launched is defined at runtime.
python logreg.py compile --data-parallel -ws 2 --pef-name=logreg-1223
The following options are included in the
compile --help output by default, but are reserved for use with SambaNova Support.
--model-parallel (requires a human decision file)
--n-chips <integer> (use only with
model-parallel, which is for use with Customer Support only)
The following argument is deprecated and will not be supported in future releases.
--o1-rule-list <yaml-file>. Starting with 1.17, this argument and related options are deprecated. Use the new
--optimization-rulesargument, discussed in Compiler optimization modes instead.