Hello SambaFlow! Compile and run a model

Welcome! In this tutorial, you learn how to compile and run a logreg.py example model. We use a classic machine learning problem of recognizing the hand-written digits.

In this tutorial you:

  1. Ensure that your environment is ready to compile and run models.

  2. If you don’t have access to the internet, ensure that data that the model uses are available.

  3. Compile the model to run on the RDU architecture.

  4. Do a training run.

We discuss the code for this model in Learn about model creation with SambaFlow.

Prepare your environment

To prepare your environment, you ensure that the SambaFlow package is installed and that the SambaNova Daemon is running.

Check your SambaFlow installation

You must have the sambaflow package installed to run this example and any of the tutorial examples.

  1. To check if the package is installed, run this command:

    • For Ubuntu Linux

      $ dpkg -s sambaflow
    • For Red Hat Enterprise Linux

      $ rpm -qi sambaflow
  2. Examine the output and look for Status: install ok installed (below the Package line).

  3. Ensure that the SambaFlow version that you are running matches the documentation you are using.

  4. If you see a message that sambaflow is not installed, contact your system administrator.

Check the SambaNova Daemon status

Before running the example, make sure the SND (SambaNova Daemon) service is running. You cannot run the examples unless SND is running.

  1. To check if SND is running, run this command:

    $ systemctl status snd
  2. Look for Active: active (running) in the output.

  3. If you don’t see this line, the service is not running or degraded. Ask your system administrator for help.

Create directories for sample copy and sample output

SambaNova recommends that you create your own copy of all sample applications and make changes to the copy, and that you create an app-test directory.

  1. Copy the content of /opt/sambaflow/apps to a directory inside your home directory. For example:

    $ mkdir $HOME/sambaflow-apps
    $ cp -r /opt/sambaflow/apps/* $HOME/sambaflow-apps
  2. Create a directory from which you run the commands and where you send the output.

    $ mkdir $HOME/app-test

Download model data (Optional)

If you run the example on a system that is not connected to the internet, you have to download the model data from a connected system and copy the data to the system where you want to run the model.

  1. On a connected system run:

    $ mkdir -p /tmp/data/MNIST/raw
    $ cd /tmp/data/MNIST/raw
    $ wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
    $ wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
    $ wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
    $ wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
  2. Copy the four .gz files to the DataScale system and place them in the directory /tmp/data/MNIST/raw.

  3. Add --data-folder=/tmp/data to the compile and run commands.

Compile and run your first model

This Hello World! example uses the classic machine learning problem of recognizing the hand-written digits in the MNIST dataset. The source code for this example is located in your SambaNova environment at /opt/sambaflow/apps/starters/logreg/logreg.py.

Look at supported options

Each example and each model has its own set of supported options.

To see all arguments for the logreg model, run the following command:

$ python /opt/sambaflow/apps/starters/logreg/logreg.py --help

The output looks similar to the following:

usage: logreg.py [-h] {compile,run,test,measure-performance} ...

positional arguments:
                        different modes of operation

optional arguments:
  -h, --help            show this help message and exit

You can drill down and run each command with --help to see options at that level. For example, run the following command to see options for compile:

$ python /opt/sambaflow/apps/starters/logreg/logreg.py compile --help
In most cases, using the defaults for the optional arguments is best. In Useful arguments for logreg.py we list a few commonly used arguments.

Prepare data

This tutorial downloads train and test datasets from the internet, so there’s no separate step for preparing data.

If your system does not have access to the internet, you have to download the data to a system that has access and make the files available. See Download model data (Optional)

Compile logreg

When you compile the model, the compiler generates a PEF file that is suitable for running on the RDU architecture. You later pass in that file when you do a training run.

To compile the model, follow these steps:

  1. Start in the app-test directory that you created in Create directories for sample copy and sample output.

    $ cd $HOME/app-test
  2. Run the compilation step, passing in the name of the PEF file. You will later pass that file in when you do a training run.

    $ python $HOME/sambaflow-apps/starters/logreg/logreg.py compile --pef-name="logreg"
  3. When the command returns to the prompt, look for this output, shown toward the end:

    • Compilation succeeded for partition_X_X shows you that compilation succeeded.

    • Logs are generated in /nvmedata/var/lib/snuser1/out/logreg/…​ shows where the log files are located.

  4. Check if the PEF file was generated:

    $ ls -lh ./out/logreg/logreg.pef

    This file contains all information that the system needs to do a training run of the model.

Start a logreg training run

When you do a training run, the application uploads the PEF file onto the chip. This example uses the MNIST dataset. The example code downloads the data set automatically.

If your system is disconnected from the Internet you have to manually download the dataset, available in Torchvision to a system with Internet access and copy the dataset to the system you are running the models on. See Download model data (Optional)
  1. Start a training run of the model with the PEF file that you generated. Use -e to specify the number of epochs (default is 1).

    $ python $HOME/sambaflow-apps/starters/logreg/logreg.py run --num-epochs 2 --pef=out/logreg/logreg.pef

    Even one epoch would be enough to train this simple model, but we use --num-epochs to see if loss decreases in the second run. After a short while, the command returns output like the following:

    2023-01-25T15:14:06 : [INFO][LIB][1421606]: sn_create_session: PEF File: out/logreg/logreg.pef
    Log ID initialized to: [snuser1][python][1421606] at /var/log/sambaflow/runtime/sn.log
    Epoch [1/2], Step [10000/60000], Loss: 0.4635
    Epoch [1/2], Step [20000/60000], Loss: 0.4087
    Epoch [1/2], Step [30000/60000], Loss: 0.3860
    Epoch [1/2], Step [40000/60000], Loss: 0.3700
    Epoch [1/2], Step [50000/60000], Loss: 0.3631
    Epoch [1/2], Step [60000/60000], Loss: 0.3552
    Test Accuracy: 91.50  Loss: 0.3005
    Epoch [2/2], Step [10000/60000], Loss: 0.2866
    Epoch [2/2], Step [20000/60000], Loss: 0.3063
    Epoch [2/2], Step [30000/60000], Loss: 0.3080
    Epoch [2/2], Step [40000/60000], Loss: 0.3084
    Epoch [2/2], Step [50000/60000], Loss: 0.3074
    Epoch [2/2], Step [60000/60000], Loss: 0.3060
    Test Accuracy: 91.32  Loss: 0.3005

Congratulations! You have run your first model on the SambaNova system! The output shows us that the training run is successful and has a very low loss percentage, which decreases over time.

Useful arguments for logreg.py

Each of the example model commands has several arguments. In most cases, the default gives good results.

Arguments for compile

For a list of compile arguments for use with logreg.py, run this command:

$ python $HOME/sambaflow-apps/starters/logreg/logreg.py compile --help

The command returns a full list of arguments. Here are some useful arguments:

  • --pef-name — Name of the output file, which has the information for running the model on RDU.

  • --n-chips, --num-tiles — Number of chips you want to use (from 1 to 8) and the number of tiles on the chip (1, 2, or 4). Default is 1 chip (4 tiles).

  • --num-features — Number of input features (for this model the default is 784)

  • --num-classes — Number of output labels (for this model the default is 10)

Arguments for run

For a list of run arguments for use with logreg.py, run this command:

$ python $HOME/sambaflow-apps/starters/logreg/logreg.py run --help

The command returns a full list of arguments. Here are some important arguments:

  • -p PEF The only required argument. A PEF file that was the output from a compile.

  • -b BATCH_SIZE, --batch-size BATCH_SIZE — How many samples to put in one batch.

  • -e, --num-epochs — How many epochs to run with the model.

  • --num-features, --num-classes — Input features and output classes for the model.

  • --lr — Learning rate parameter. Decimal fraction between 0 and 1.