Convert existing models to SambaFlow

Many SambaNova customers have converted an existing model that they built in PyTorch to work with SambaFlow. This doc page uses a simple example to illustrate what is essential for the conversion and discusses some best practices. You’ll see that much of your code remains unchanged and that SambaFlow doesn’t require you to reformat your data.

This doc page is about model conversion. We have a public GitHub repository with two scripts for pretraining data creation, pipeline.py and data_prep.py.

Get started with model conversion

In this document, we will:

Discuss The example model.
Explain the SambaNova workflow.
Present some Planning questions that will help you be successful.
Discuss the required and optional changes to the model PyTorch code.
- Discuss code changes if the model includes a loss function in Model functions and changes.
- Discuss code changes if the model uses an external a loss function in Model with an external loss function.

The example model

Convolutional Neural Networks (CNNs) are a popular model type in the Visual AI space. Our example model is a CNN that performs image classification on the MNIST dataset. It consists of four layers:

2 Convolutional layers, each containing a:
- Conv2D
- ReLU
- MaxPool2D
2 Fully-connected linear layers

Included or external loss function

This conversion example presents two example solutions:

A solution that includes the model’s loss function as part of the model definition.
- This approach results in great performance enhancements because loss computation happens on RDU.
- In the example, the loss function is included in the forward() function. See Model functions and changes for a discussion of the code.
A solution where uses the loss function is external to the model.
- With this solution, we’ll use a host CPU to compute the loss and gradients for backpropagation.
- Use this approach if your model’s loss function isn’t currently supported by SambaFlow or if you are using a custom loss function. See Model with an external loss function. for a discussion of the code.

Original and converted model code download

This tutorial explains code modifications using a simple example. The model is a 2-layer Convolutional Neural Network.

You can download the original code from this repo: https://github.com/adventuresinML/adventures-in-ml-code/blob/master/conv_net_py_torch.py.

The revised code is available for download. There are two examples with different loss functions.

Included loss function

import sambaflow
import sambaflow.samba as samba
import sambaflow.samba.optim as optim
import sambaflow.samba.utils as utils
from sambaflow.samba.utils.common import common_app_driver
from sambaflow.samba.utils.argparser import parse_app_args
from sambaflow.samba.sambaloader import SambaLoader

import sys
import argparse
from typing import Tuple

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

class ConvNet(nn.Module):
    """
    Instantiate a 4-layer CNN for MNIST Image Classification.

    In SambaFlow, it is possible to include a loss function as part of a model's definition and put it in
    the forward method to be computed.

    Typical SambaFlow usage example:

    model = ConvNet()
    samba.from_torch_model_(model)
    optimizer = ...
    inputs = ...
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    """

    def __init__(self):

        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(7 * 7 * 64, 1000)
        self.fc2 = nn.Linear(1000, 10)
        self.criterion = nn.CrossEntropyLoss() # Add loss function to model

    def forward(self, x: torch.Tensor, labels: torch.Tensor):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        loss = self.criterion(out, labels)     # Compute loss
        return loss, out

def add_user_args(parser: argparse.ArgumentParser) -> None:
    """
    Add user-defined arguments.

    Args:
        parser (argparse.ArgumentParser): SambaFlow argument parser
    """

    parser.add_argument(
        "-bs",
        type=int,
        default=100,
        metavar="N",
        help="input batch size for training (default: 100)",
    )
    parser.add_argument(
        "--num-epochs",
        type=int,
        default=6,
        metavar="N",
        help="number of epochs to train (default: 6)",
    )
    parser.add_argument(
        "--num-classes",
        type=int,
        default=10,
        metavar="N",
        help="number of classes in dataset (default: 10)",
    )
    parser.add_argument(
        "--learning-rate",
        type=float,
        default=0.001,
        metavar="LR",
        help="learning rate (default: 0.001)",
    )
    parser.add_argument(
        "--data-path",
        type=str,
        default="data",
        help="Download location for MNIST data",
    )
    parser.add_argument(
        "--model-path", type=str, default="model", help="Save location for model"
    )

def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]:
    """
    Generates random SambaTensors in the same shape as MNIST image  and label tensors.

    In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that
    is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A tuple of SambaTensors with random values in the same shape as MNIST image and label tensors.
    """

    dummy_image = (
        samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0),
        samba.randint(args.num_classes, (args.bs,), name="label", batch_dim=0),
    )

    return dummy_image

def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, sambaflow.samba.sambaloader.SambaLoader]:
    """
    Transforms MNIST input to tensors and creates training/test dataloaders.

    Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
    data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
    SambaLoaders.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A tuple of SambaLoaders over the training and test sets.
    """

    # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
    transform = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,)),
        ]
    )

    # Get the train & test data (images and labels) from the MNIST dataset
    train_dataset = datasets.MNIST(
        root=args.data_path,
        train=True,
        transform=transform,
        download=True,
    )
    test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)

    # Set up the train & test data loaders (input pipeline)
    train_loader = DataLoader(
        dataset=train_dataset, batch_size=args.bs, shuffle=True
    )
    test_loader = DataLoader(
        dataset=test_dataset, batch_size=args.bs, shuffle=False
    )

    # Create SambaLoaders
    sn_train_loader = SambaLoader(train_loader, ["image", "label"])
    sn_test_loader = SambaLoader(test_loader, ["image", "label"])

    return sn_train_loader, sn_test_loader

def train(args: argparse.Namespace, model: nn.Module) -> None:
    """
    Trains the model.

    Prepares and loads the data, then runs the training loop with the hyperparameters specified
    by the input arguments.  Calculates loss and accuracy over the course of training.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments
        model (nn.Module): ConvNet model
    """

    sn_train_loader, _ = prepare_dataloader(args)
    hyperparam_dict = {"lr": args.learning_rate}

    total_step = len(sn_train_loader)
    loss_list = []
    acc_list = []

    for epoch in range(args.num_epochs):
        for i, (images, labels) in enumerate(sn_train_loader):

            # Run the model on RDU: forward -> loss/gradients -> backward/optimizer
            loss, outputs = samba.session.run(
                input_tensors=(images, labels),
                output_tensors=model.output_tensors,
                hyperparam_dict=hyperparam_dict
            )

            # Convert SambaTensors back to Torch Tensors to calculate accuracy
            loss, outputs = samba.to_torch(loss), samba.to_torch(outputs)
            loss_list.append(loss.tolist())

            # Track the accuracy
            total = labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            acc_list.append(correct / total)

            if (i + 1) % 100 == 0:
                print(
                    "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                        epoch + 1,
                        args.num_epochs,
                        i + 1,
                        total_step,
                        torch.mean(loss),
                        (correct / total) * 100,
                    )
                )

def main(argv):

    args = parse_app_args(argv=argv, common_parser_fn=add_user_args)

    # Create the CNN model
    model = ConvNet()

    # Convert model to SambaFlow (SambaTensors)
    samba.from_torch_model_(model)

    # Create optimizer
    # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
    optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate)

    # Normally, we'd define a loss function here, but with SambaFlow, it can be defined
    # as part of the model, which we have done in this case

    # Create dummy SambaTensor for graph tracing
    inputs = get_inputs(args)

    # The common_app_driver() handles model compilation and various other tasks, e.g.,
    # measure-performance.  Running, or training, a model must be explicitly carried out
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    else:
        common_app_driver(args=args,
                        model=model,
                        inputs=inputs,
                        optim=optimizer,
                        name=model.__class__.__name__,
                        init_output_grads=not args.inference,
                        app_dir=utils.get_file_dir(__file__))

if __name__ == '__main__':
    main(sys.argv[1:])

Custom loss function

import sambaflow
import sambaflow.samba as samba
import sambaflow.samba.optim as optim
import sambaflow.samba.utils as utils
from sambaflow.samba.utils.common import common_app_driver
from sambaflow.samba.utils.argparser import parse_app_args
from sambaflow.samba.sambaloader import SambaLoader

import sys
import argparse
from typing import (Tuple, Callable)

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

class ConvNetCustomLoss(nn.Module):
    """
    Instantiate a 4-layer CNN for MNIST Image Classification.

    In SambaFlow, while it is possible to include a loss function in the model definition, it
    is not done here as an example of how to compute loss on the host.

    Typical SambaFlow usage example:

    model = ConvNet()
    samba.from_torch_(model)
    optimizer = ...
    inputs = ...
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    """

    def __init__(self):

        super(ConvNetCustomLoss, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(7 * 7 * 64, 1000)
        self.fc2 = nn.Linear(1000, 10)

    def forward(self, x: torch.Tensor):
        # Since loss isn't part of the model, we don't pass a label to forward()
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

def add_user_args(parser: argparse.ArgumentParser) -> None:
    """
    Add user-defined arguments.

    Args:
        parser (argparse.ArgumentParser): SambaFlow argument parser
    """

    parser.add_argument(
        "-bs",
        type=int,
        default=100,
        metavar="N",
        help="input batch size for training (default: 100)",
    )
    parser.add_argument(
        "--num-epochs",
        type=int,
        default=6,
        metavar="N",
        help="number of epochs to train (default: 6)",
    )
    parser.add_argument(
        "--num-classes",
        type=int,
        default=10,
        metavar="N",
        help="number of classes in dataset (default: 10)",
    )
    parser.add_argument(
        "--learning-rate",
        type=float,
        default=0.001,
        metavar="LR",
        help="learning rate (default: 0.001)",
    )
    parser.add_argument(
        "--data-path",
        type=str,
        default="data",
        help="Download location for MNIST data",
    )
    parser.add_argument(
        "--model-path", type=str, default="model", help="Save location for model"
    )

def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]:
    """
    Generates random SambaTensors in the same shape as MNIST image tensors.

    In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that
    is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A SambaTensor with random values in the same shape as MNIST image tensors.
    """

    # Loss is computed on the host, so a dummy SambaTensor is only needed for the MNIST images
    return samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0),

def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, ...]:
    """
    Transforms MNIST input to tensors and creates training/test dataloaders.

    Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
    data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
    SambaLoaders.

    Input:
        args: User- and system-defined command line arguments

    Returns:
        A tuple of SambaLoaders over the training and test sets.
    """

    # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
    transform = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,)),
        ]
    )

    # Get the train & test data (images and labels) from the MNIST dataset
    train_dataset = datasets.MNIST(
        root=args.data_path,
        train=True,
        transform=transform,
        download=True,
    )
    test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)

    # Set up the train & test data loaders (input pipeline)
    train_loader = DataLoader(
        dataset=train_dataset, batch_size=args.bs, shuffle=True
    )
    test_loader = DataLoader(
        dataset=test_dataset, batch_size=args.bs, shuffle=False
    )

    # Create SambaLoaders
    # function_hook allows us to specify which tensor(s) should be passed along to the model
    #  -> The hook must return a list containing the same number of tensors as specified in the list of names
    #  -> Any other tensors will be filtered out, so if you need those, then...
    # return_original_batch allows us to retain the original input tensors for later processing, e.g., computing loss
    #  -> It causes the SambaLoader to also return a list of the original input tensors
    sn_train_loader = SambaLoader(dataloader=train_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)
    sn_test_loader = SambaLoader(dataloader=test_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)

    return sn_train_loader, sn_test_loader

def train(args: argparse.Namespace, model: nn.Module, criterion: Callable) -> None:
    """
    Trains the model.

    Prepares and loads the data, then runs the training loop with the hyperparameters specified
    by the input arguments with a given loss function.  Calculates loss and accuracy over the course of training.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments
        model (nn.Module): ConvNet model
        criterion (Callable): Loss function
    """

    sn_train_loader, sn_test_loader = prepare_dataloader(args)
    hyperparam_dict = {"lr": args.learning_rate}

    total_step = len(sn_train_loader)
    loss_list = []
    acc_list = []

    for epoch in range(args.num_epochs):
        for i, (images, original_batch) in enumerate(sn_train_loader):

            # The label tensor is the second element of the original batch
            labels = original_batch[1]

            # Run only the forward pass on RDU and note the section_types argument
            # The first element of the returned tuple contains the raw outputs of forward()
            outputs = samba.session.run(
                input_tensors=(images,),
                output_tensors=model.output_tensors,
                hyperparam_dict=hyperparam_dict,
                section_types=["FWD"]
            )[0]

            # Convert SambaTensors back to Torch Tensors to carry out loss calculation
            # on the host CPU.  Be sure to set the requires_grad attribute for PyTorch.
            outputs = samba.to_torch(outputs)
            outputs.requires_grad = True

            # Compute loss on host CPU and store it for later tracking
            loss = criterion(outputs, labels)

            # Compute gradients on CPU
            loss.backward()
            loss_list.append(loss.tolist())

            # Run the backward pass and optimizer step on RDU and note the grad_of_outputs
            # and section_types arguments
            samba.session.run(
                input_tensors=(images,),
                output_tensors=model.output_tensors,
		        hyperparam_dict=hyperparam_dict,
		        grad_of_outputs=[samba.from_torch_tensor(outputs.grad)], # Bring the grads back from CPU to RDU
                section_types=["BCKWD", "OPT"])

            # Compute and track the accuracy
            total = labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            acc_list.append(correct / total)

            if (i + 1) % 100 == 0:
                print(
                    "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                        epoch + 1,
                        args.num_epochs,
                        i + 1,
                        total_step,
                        torch.mean(loss),
                        (correct / total) * 100,
                    )
                )

def main(argv):

    args = parse_app_args(argv=argv, common_parser_fn=add_user_args)

    # Create the CNN model
    model = ConvNetCustomLoss()

    # Convert model to SambaFlow (SambaTensors)
    samba.from_torch_model_(model)

    # Create optimizer
    # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
    optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate)

    ###################################################################
    # Define loss function here to be used in the forward pass on CPU #
    ###################################################################
    criterion = nn.CrossEntropyLoss()

    # Create dummy SambaTensor for graph tracing
    inputs = get_inputs(args)

    # The common_app_driver() handles model compilation and various other tasks, e.g.,
    # measure-performance.  Running, or training, a model must be explicitly carried out
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, init_output_grads=not args.inference, pef=args.pef, mapping=args.mapping)
        train(args, model, criterion)
    else:
        common_app_driver(args=args,
                        model=model,
                        inputs=inputs,
                        optim=optimizer,
                        name=model.__class__.__name__,
                        init_output_grads=not args.inference,
                        app_dir=utils.get_file_dir(__file__))

if __name__ == '__main__':
    main(sys.argv[1:])

SambaNova workflow

When you want to run your model on SambaNova hardware, the typical workflow is the following:

Start with the Planning questions to get the most out of your model. You might find some of our background materials interesting, for example the white paper Accelerated computing with a Reconfigurable Dataflow Architecture.
Modify your code following the guidance in this tutorial:
- Model functions and changes uses an included loss function.
- Model with an external loss function is the same converted model but uses an external loss function.
Compile your model. The output of the compilation is a PEF file, a binary file containing the full details of the model that can be deployed onto an RDU. See Compile and run your first model.
Prepare the data you want to feed to your compiled model. See the data preparation scripts in our public GitHub repository .
Run the model, passing in the PEF file. For some background discussion, see Compile and run your first model.

Planning questions

You can ask yourself some questions to make the conversion process more straightforward. These questions will help you identify where you need to add methods from SambaFlow to your code.

Where are my dataloaders?

All models need data and one of the easiest ways to feed in that data is with a PyTorch DataLoader. The output tensors that come from the DataLoader need to be converted into SambaTensors. See Load data with prepare_dataloader().
What shape are my input tensors?

When you compile a SambaFlow model, the compute graph of your model is physically mapped onto an RDU. To perform this mapping, SambaFlow needs to know the shape of the input tensors. See Generate SambaTensors with get_inputs().
Where is my model defined?

A useful feature of SambaFlow is that a loss function can be included in the definition and forward section of a model. A loss function can be mapped directly onto an RDU, greatly enhancing performance. See Define the model.
Where is my model instantiated?

The model must be explicitly converted to SambaFlow. Fortunately, only a single SambaFlow method needs to be used to do that. See Tie the pieces together with main().
Where is my loss function defined and what is it?

A loss function can be a part of a model’s definition. So, if your model uses a PyTorch loss function that SambaFlow supports, the function can be moved, as in Define the model. If your model doesn’t use a supported loss function it can be used externally. See Model with an external loss function.
Where is my optimizer defined and what is it?

Unlike loss functions, optimizers can’t be added directly to a model’s definition in SambaFlow. Loss functions are passed into SambaFlow during compilation and training. See Tie the pieces together with main().

Model functions and changes

Throughout this document, you will see only minor modifications of your PyTorch code are necessary. In this conversion example, we include some required changes but also some changes that improve the robustness of the code, for example, addition of a main() function.

Overview of required changes

When you review the SambaFlow versions of the code for this model, you will notice that they look different from the original. This difference is, however, purely aesthetic - it’s meant to more clearly point out the SambaFlow additions.

The required changes are:

The SambaFlow Python imports.
The shape of the input tensors.
The SambaFlow tensor conversion methods.

If you’re converting your own model, there’s no need to refactor and reorganize your code. Make the changes in the right spots and your model works in SambaFlow.

At the code level, additions include:

common_app_driver()
utils.trace_graph()
from_torch_model_()
from_torch_tensor() and samba.to_torch()
samba.optim.AdamW()
utils.argparser.parse_app_args()

Most of the differences between the original code and the SambaFlow code deal with code readability and modularity.

Imports

As a first step, we import SambaFlow libraries so the code can run on a SambaNova system. See the API Reference for background.

Imports required by SambaFlow

import sambaflow
import sambaflow.samba as samba
import sambaflow.samba.optim as optim
import sambaflow.samba.utils as utils

from sambaflow.samba.utils.argparser import parse_app_args
from sambaflow.samba.utils.common import common_app_driver
from sambaflow.samba.sambaloader import SambaLoader

sambaflow is the base package.
samba corresponds to PyTorch torch.
samba.optim is similar to PyTorch optim and contains the optimizers available in SambaFlow.
samba.utils contains various SambaFlow-specific utilities for graph tracing, compiling and measuring performance, and so on.
samba.utils.argparser is similar to Python’s argparse library, but intended for requirements particular to SambaFlow.

parse_app_args enables argument parsing supporting the SambaFlow execution modes (compile, run, test, measure performance). Users can define their own arguments and pass those into SambaFlow.
sambaflow.samba.utils.common

common_app_driver is a tool to make using a compiled model easier. It provides a single interface for compiling a model, and several means of measuring a model’s performance, such as measure-cpu, measure-gpu, measure-performance, and measure-sections.
sambaflow.samba.sambaloader

SambaLoader is a wrapper around the Pytorch DataLoader and is built to take advantage of the SambaNova architecture to more efficiently parallelize load operations with graph/compute operations. It also automatically converts Torch tensors into SambaTensors.

Additional imports

In this conversion example, there are several Python and PyTorch imports that are typically used for building a CNN. These imports can be left as-is. In fact, you’ll likely want the same imports for any model you bring in: SambaFlow is additive.

The following imports are required for this example:

import sys
import argparse
from typing import Tuple

SambaFlow can transparently handle many native PyTorch methods. During compilation, those methods are optimized for the SambaNova RDU to take advantage of the SambaNova Reconfigurable Dataflow Architecture.

Define the model

Here’s the modified ConvNet class, with comments that explain changes.

ConvNet class

class ConvNet(nn.Module):
    """
    Instantiate a 4-layer CNN for MNIST Image Classification.

    In SambaNova, we can define the loss function as a part of the model
    and include it in the forward method to be computed.

    Typical SambaFlow usage example:

    model = ConvNet()
    samba.from_torch_model_(model)
    optimizer = ...
    inputs = ...
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    """
    def __init__(self):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(7 * 7 * 64, 1000)
        self.fc2 = nn.Linear(1000, 10)
        self.criterion = nn.CrossEntropyLoss() # Add loss function to model

    def forward(self, x: torch.Tensor, labels: torch.Tensor):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        loss = self.criterion(out, labels)     # Compute loss
        return loss, out                       # Return loss

The model definition differs only slightly from the original version: we've added a loss function to the model definition.  Everything else remains as pure PyTorch.

Layers. Functions are run layer by layer. Our code example doesn’t change the original PyTorch nn methods when defining the layers ( Sequential(), Conv2d(), ReLU(), MaxPool2d(), Dropout(), and Linear()).
Loss function. The only difference between this code and the original code is the loss function:
- We define the loss function — nn.CrossEntropyLoss()-- directly as part of the model.
- We include the loss function in the init() and forward() methods.
  
  When our example model is trained, the forward() method computes and returns both the output tensors and the loss.
  
  With this change, loss is computed directly on the RDU. That gives us a performance boost. If you don’t include a loss function, the output tensors of the forward function can be passed out and the loss must be computed externally on the host CPU. This results in lower performance, but it does allow a user to leverage custom loss functions.
forward() method. The forward() method is custom. In addition to computing the weights, the method also computes and returns the loss.

If you use a Pytorch function that the SambaFlow API does not yet support, the function is automatically computed on CPU instead of RDU, resulting in slower performance.

Capture user arguments with add_user_args

The add_user_args function captures and encapsulates user-defined command-line arguments so that they can be more easily passed to SambaFlow via the samba.utils.argparser.parse_app_args() method.

The arguments to add_user_args() are the model’s hyperparameters and the two path variables for storing the data and the model.

The SambaFlow compiler has defaults for most of the arguments. If you don’t provide a value for this argument, the SambaFlow default will be used.

add_user_args() function

def add_user_args(parser: argparse.ArgumentParser) -> None:
   parser.add_argument(
       "--batch-size",
       type=int,
       default=100,
       metavar="N",
       help="input batch size for training (default: 100)",
   )
   parser.add_argument(
       "--num-epochs",
       type=int,
       default=6,
       metavar="N",
       help="number of epochs to train (default: 6)",
   )
   parser.add_argument(
       "--num-classes",
       type=int,
       default=10,
       metavar="N",
       help="number of classes in dataset (default: 10)",
   )
   parser.add_argument(
       "--learning-rate",
       type=float,
       default=0.001,
       metavar="LR",
       help="learning rate (default: 0.001)",
   )
   parser.add_argument(
       "--data-path",
       type=str,
       default="data",
       help="Download location for MNIST data",
   )  # From DATA_PATH
   parser.add_argument(
       "--model-path", type=str, default="model", help="Save location for model"
   )  # From MODEL_STORE_PATH

Generate SambaTensors with get_inputs()

To properly trace the PyTorch model graph to map it onto an RDU, the compiler requires tensors of the same shape as those that are passed to the forward() method during training. The data in these tensors isn’t important to the mapping. However, the compiler must able to determine how the tensors change shape as they flow from the input to the output of the graph. This helps the compiler to generate a PEF file that optimally lays out your model on the RDU.

In the case of the MNIST data input, which we use in this example, two tensors are needed:

One tensor that matches the shape of an MNIST image. We use samba.randn below to represent that.
One tensor that matches the shape of an MNIST label. We use samba.randint below because the label is a number between 1 and 10.

The get_inputs() function returns a tuple of the tensors.

get_inputs() function

def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]:
    """
    Generates random SambaTensors in the same shape as MNIST image tensors and labels.

    In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU.

    Input:
        args: User- and system-defined command line arguments

    Returns:
        A tuple of SambaTensors with random values in the same shape as MNIST image tensors.
    """
    dummy_image = (
        samba.randn(args.batch_size, 1, 28, 28, name="image", batch_dim=0),
        samba.randint(args.num_classes, (args.batch_size,), name="label", batch_dim=0),
    )
    return dummy_image

The function includes the samba.randn() and samba.randint() methods.

These methods are functionally identical to their PyTorch counterparts, but return SambaTensors rather than Torch tensors. SambaTensors are wrappers for Torch tensors and have additional data members and methods to support the SambaNova RDU architecture.

In order to place the model graph onto RDU, we have to know how many PCUs (compute units) and PMUs (memory units) are needed to optimally run the model. For that, we have to tell the compiler the shape of input tensors - the actual values in the tensors aren’t important. This is why we create dummy_image. A user can determine the shape of the input tensor(s) from analysis of the dataset or from the model’s hyperparameters (e.g. the height and width of an input image).

The name and batch_dim data members don’t exist in the original PyTorch implementation.
- name is the name of the image.
- batch_dim is the batch dimension.

Methods for getting and setting RDU device memory and for syncing data between an RDU and host CPU are not used in this example.

See the API Reference for details.

Load data with prepare_dataloader()

The prepare_dataloader() function returns SambaLoaders that are wrapped around your original Pytorch DataLoaders. It is used to load and transform input data. SambaLoader enables the loading process to better leverage the SambaNova parallel architecture while converting to Torch tensors and returning SambaTensors. The function is almost purely PyTorch, but it requires minor changes because the RDU works with SambaTensors.

At minimum, a SambaLoader needs:

A DataLoader passed in via the dataloader parameter.
A list of names to give to the tensors via the names parameter (all SambaTensors are named, either by the user or automatically by SambaFlow).

A less efficient way of achieving this conversion is to explicitly call samba.from_torch_tensor() on the Torch tensors returned from the Pytorch DataLoaders. Using SambaLoader instead is recommended.

prepare_dataloader() function

def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, sambaflow.samba.sambaloader.SambaLoader]:
    """
    Transforms MNIST input to tensors and creates training/test dataloaders.

    Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
    data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
    SambaLoaders.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A tuple of SambaLoaders over the training and test sets.
    """

    # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
    transform = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,)), # normalize the MNIST data
        ]
    )

    # Get the train & test data (images and labels) from the MNIST dataset
    train_dataset = datasets.MNIST(
        root=args.data_path,
        train=True,
        transform=transform,
        download=True,
    )
    test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)

    # Set up the train & test data loaders (input pipeline)
    train_loader = DataLoader(
        dataset=train_dataset, batch_size=args.bs, shuffle=True
    )
    test_loader = DataLoader(
        dataset=test_dataset, batch_size=args.bs, shuffle=False
    )

    # Create SambaLoaders
    sn_train_loader = SambaLoader(train_loader, ["image", "label"])
    sn_test_loader = SambaLoader(test_loader, ["image", "label"])

    return sn_train_loader, sn_test_loader

Train the model with train()

The train() method contains the training loop for the model. The code is similar to PyTorch.

train() method

def train(args: argparse.Namespace, model: nn.Module) -> None:
    """
    Trains the model.

    Prepares and loads the data, then runs the training loop with the hyperparameters specified
    by the input arguments.  Calculates loss and accuracy over the course of training.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments
        model (nn.Module): ConvNet model
    """

    sn_train_loader, _ = prepare_dataloader(args)
    hyperparam_dict = {"lr": args.learning_rate}

    total_step = len(sn_train_loader)
    loss_list = []
    acc_list = []

    for epoch in range(args.num_epochs):
        avg_loss = 0
        for i, (images, labels) in enumerate(train_loader):
            # Run the forward pass
            # Convert the images from Torch Tensors into SambaTensors
            sn_images = samba.from_torch(images, name="image", batch_dim=0)
            sn_labels = samba.from_torch(labels, name="label", batch_dim=0)

            loss, outputs = samba.session.run(
                input_tensors=(images, labels),
                output_tensors=model.output_tensors,
                hyperparam_dict=hyperparam_dict
            )

            # Convert SambaTensors back to Torch Tensors to calculate accuracy
            loss, outputs = samba.to_torch(loss), samba.to_torch(outputs)
            loss_list.append(loss.tolist())

            # Track the accuracy
            total = labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            acc_list.append(correct / total)

            if (i + 1) % 100 == 0:
                print(
                    "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                        epoch + 1,
                        args.num_epochs,
                        i + 1,
                        total_step,
                        torch.mean(loss),
                        (correct / total) * 100,
                    )
                )

Here’s how the function works:

The inner training loop runs over the enumerated samples that are generated by the SambaLoader that is created by the prepare_dataloader() function. In this example, we are only using the SambaLoader that generates training samples.
The SambaTensors are then passed as input to samba.session.run(). This method performs the entire training pass, from the foward pass all the way to the backward pass and optimization.

A Session is a SambaFlow object that contains the variables and methods that are needed to compile and run a model on an RDU. In SambaFlow parlance, to run a model means to train it. The model object is created during compilation.

samba.session.run() takes in two key arguments: input_tensors and output_tensors.
- input_tensors are the data on which the model is to be trained (what we get from prepare_dataloader()).
- output_tensors capture the output shape that is generated by model compilation. Here’s how it works:
  
  When SambaFlow compiles a model, it generates a dataflow graph, which is similar to a PyTorch computational graph. To run the model, that graph must be traced before it can be placed onto the RDU. SambaFlow must know about the output shape that is generated by model compilation so that it can terminate the trace and map the graph onto the RDU. This output shape is captured in the output_tensors argument.
  
  What is actually contained in model.output_tensors is the output of the forward(). Thus, the output of samba.session.run() will also be that of forward().
To track the progress of model training, we output the loss and accuracy per epoch. This is standard practice and SambaFlow doesn’t change that. However, progress tracking should be run on a CPU, not an RDU, so we use the method samba.to_torch() to convert loss and outputs to Torch tensors. The standard Torch functions can then be applied to loss and outputs on the CPU.

Tie the pieces together with main()

In contrast to the original code, our code includes a main() function for more flexibility. The main() function is called to initialize the model, the data, optimizers, arguments, etc. and then kick off compilation and training.

main() function

def main(argv):

    args = parse_app_args(argv=argv, common_parser_fn=add_user_args)

    # Create the CNN model
    model = ConvNet()

    # Convert model to SambaFlow (SambaTensors)
    samba.from_torch_model_(model)

    # Create optimizer
    # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
    optimizer = optim.AdamW(model.parameters(), lr=args.learning_rate)

    # Normally, we'd define a loss function here, but with SambaFlow, it can be defined
    # as part of the model, which we have done in this case

    # Dummy SambaTensor
    inputs = get_inputs(args)

    # The common_app_driver() handles model compilation and various other tasks, e.g.,
    # measure-performance.  Running, or training, a model must be explicitly carried out
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    else:
        common_app_driver(args=args,
                        model=model,
                        inputs=inputs,
                        optim=optimizer,
                        name=model.__class__.__name__,
                        init_output_grads=not args.inference,
                        app_dir=utils.get_file_dir(__file__))

Components of model training

The `main() function goes through these steps to get the model trained with SambaFlow:

The process begins by processing command line arguments, and we use samba.utils.argparser.parse_app_args() to capture the necessary arguments from the command line.

The common_parser_fn variable is used to pass user-defined arguments to the SambaFlow backend. We created the add_user_args() function to make that possible.
Next, we create the PyTorch model in the typical way.

We call the samba.from_torch_model_() method, which is part of the Session library to convert our PyTorch model into a SambaFlow model. The method recursively, and in-place, goes through a computational graph and converts all the Torch tensors into SambaTensor instances.

While from_torch_model_() and samba.from_torch_tensor() look similar, they are very different and are not interchangeable.
- from_torch_model_() is a Session method and converts models.
- samba.from_torch_tensor() is a SambaTensor method and converts only a single Torch Tensor into a SambaTensor.
Next, we define the optimizer. Currently, SambaFlow supports the AdamW and SGD optimizers. We use AdamW here. The optimizer must be defined externally as in this example. You cannot add an optimizer directly to the model definition in SambaFlow.
We then create the “dummy” inputs to allow the SambaFlow compiler to trace the computational graph and map the resulting Dataflow graph to the RDU. For compilation, it is a best practice to use the common_app_driver(). See Use common_app_driver in main() for compilation.

The output of compilation is a PEF file, a binary file that contains the full details of the model. The PEF file can be deployed onto an RDU.
To run, i.e., train, the model we have to use two methods:
- The utils.trace_graph() method traces over the graph in a PEF file, initializing the weights and input/output tensors on the RDU. It takes as input the model, inputs, optimizer, the PEF file and a mapping.
  - The PEF is passed in as args.pef. This argument is part of the SambaFlow ArgParser, so you need not define it. You specify name and location of the PEF at the command line during compile with the --pef argument. See Compile and run the model for an example.
  - The mapping argument tells SambaFlow how to place the model onto the RDU. There are two options: spatial and section. A spatial mapping places the entire model onto the RDU at once (or up to a defined batch size). A section mapping breaks the model into several sections to be deployed on-chip one at a time. The default is section mapping.
- The train() method indicates that we want to do a training run.

Use common_app_driver in main() for compilation

Compilation can be initiated in one of two ways:

With the common_app_driver() (this is a best practice).
With the samba.session.compile() method, an earlier approach to compilation.

Using common_app_driver() enables several capabilities (compile, dump, measure-cpu, measure-gpu, measure-performance, and measure-sections) with just one argument on the command line. SambaNova might add capabilities to this method over time. To use common_app_driver() you import it from sambaflow.samba.utils.common (see Imports).

Compile and run the model

To compile:

$ python <model.py> compile --pef-name <pef_name>

To run:

$ python <model.py> run --pef </path/to/pef_name>

See Compile and run your first model for details

Model with an external loss function

Model functions and changes discusses how to convert a PyTorch model that contains a loss function in its definition (as part of its forward() method). It is also possible to use a loss function outside the model definition. You might want to do this if your loss function isn’t currently supported by SambaFlow or if you are using a custom loss function.

With an external loss function we use a host CPU to compute the loss and gradient for backpropagation. We’ll make changes to transport tensors between RDU and CPU.

The following sections show the CNN model with an external loss function. The functions below are the updated functions. Everything else remains unchanged.

forward() used with external loss function

The first change is made to the model’s forward() method. Because loss is no longer computed on the RDU, you don’t need to pass the label tensors to this function and can remove that method parameter and the loss function itself:

forward()

   def forward(self, x: torch.Tensor):
       # Since loss isn't part of the model, we don't pass a label to forward()
       out = self.layer1(x)
       out = self.layer2(out)
       out = out.reshape(out.size(0), -1)
       out = self.drop_out(out)
       out = self.fc1(out)
       out = self.fc2(out)
       return out

Compare this to the original forward() method in Define the model.

prepare_dataloader() used with external loss function

The changes in prepare_dataloader() involve the SambaLoaders. By default, a SambaLoader converts each tensor given to it by a DataLoader and passes all of them along. In the case of MNIST, this means both the image and label tensors.

However, our model no longer takes in label tensors, so we filter those out. Labels are used during training - the model uses them to calculate the gradients (loss) that are then used to adjust the model’s weights and biases during the backward pass.

When we include the loss function in the model, we have to pass in the labels to the initializer and forward methods of the model so that the gradients and backward pass can be computed on the RDU.
When we use an external loss function, we no longer pass the labels to the forward method because the loss isn’t computed on RDU.

We provide an anonymous function to the SambaLoader via the function_hook parameter. The function acts as a filter, removing the tensors that you don’t want. The function must return a list and it must return the same number of tensors as named in the names parameter.

It is possible to retain the original tensors from the DataLoader. If you set the return_original_batch parameter to True, the SambaLoader returns a list that contains the tensors you filtered for and the original tensors, in that order. This allows us to preserve the MNIST labels for use in the loss calculation.

Compare this to the original prepare_dataloader() function in Load data with prepare_dataloader().

prepare_dataloader()

def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, ...]:
   """
   Transforms MNIST input to tensors and creates training/test dataloaders.

   Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
   data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
   SambaLoaders.

   Input:
       args: User- and system-defined command line arguments

   Returns:
       A tuple of SambaLoaders over the training and test sets.
   """

   # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
   transform = transforms.Compose(
       [
           transforms.ToTensor(),
           transforms.Normalize((0.1307,), (0.3081,)),
       ]
   )

   # Get the train & test data (images and labels) from the MNIST dataset
   train_dataset = datasets.MNIST(
       root=args.data_path,
       train=True,
       transform=transform,
       download=True,
   )
   test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)

   # Set up the train & test data loaders (input pipeline)
   train_loader = DataLoader(
       dataset=train_dataset, batch_size=args.bs, shuffle=True
   )
   test_loader = DataLoader(
       dataset=test_dataset, batch_size=args.bs, shuffle=False
   )

   # Create SambaLoaders
   sn_train_loader = SambaLoader(dataloader=train_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)
   sn_test_loader = SambaLoader(dataloader=test_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)

   return sn_train_loader, sn_test_loader

train() used with external loss function

We change the train() method to accommodate an external loss function.

Add a new parameter to the function, allowing a loss function to be passed into it (we will do this in the main() function).
Change the inner training loop: we still loop over the enumerated output from a SambaLoader, but we take an extra step to extract the labels from the original batch.
Change the computation of the model’s forward and backward sections.
- Modify the samba.session.run() method to only work with image tensors (via the input_tensors parameter) and to only compute the forward section (via setting the section_types parameter to "FWD"). The raw output of the model’s forward() method is captured in the first element of the tuple returned by samba.session.run().
- We use this output to compute the loss and gradients on the CPU. We pass the output to the CPU via samba.to_torch().
The next few operations are pure PyTorch: set requires_grad to True, call the loss function on the output and labels, and then compute the backward pass.
To finish the computation, we pass the output back from the CPU to the RDU via another call to samba.session.run(). We use the grad_of_outputs parameter, which takes in a list of gradients to be applied in the model’s backward pass on RDU. We set this parameter by calling samba.from_torch_tensor() to convert the output gradients to SambaTensors.
We set the section_types parameter to a list containing “BCKWD” and “OPT” to run only those model sections on the RDU, thus completing one iteration of the training loop.

Compare this to the original train() function in Train the model with train().

train()

def train(args: argparse.Namespace, model: nn.Module, criterion: Callable) -> None:
   """
   Trains the model.

   Prepares and loads the data, then runs the training loop with the hyperparameters specified
   by the input arguments with a given loss function.  Calculates loss and accuracy over the course of training.

   Inputs:
       args: User- and system-defined command line arguments
       model: ConvNet model
       criterion: Loss function

   Returns:
       None
   """

   sn_train_loader, sn_test_loader = prepare_dataloader(args)
   hyperparam_dict = {"lr": args.learning_rate}

   total_step = len(sn_train_loader)
   loss_list = []
   acc_list = []

   for epoch in range(args.num_epochs):
       for i, (images, original_batch) in enumerate(sn_train_loader):

           # The label tensor is the second element of the original batch
           labels = original_batch[1]

           # Run only the forward pass on RDU and note the section_types argument
           # The first element of the returned tuple contains the raw outputs of forward()
           outputs = samba.session.run(
               input_tensors=(images,),
               output_tensors=model.output_tensors,
               hyperparam_dict=hyperparam_dict,
               section_types=["FWD"]
           )[0]

           # Convert SambaTensors back to Torch Tensors to carry out loss calculation
           # on the host CPU.  Be sure to set the requires_grad attribute for Torch.
           outputs = samba.to_torch(outputs)
           outputs.requires_grad = True

           # Compute loss on host CPU and store it for later tracking
           loss = criterion(outputs, labels)

           # Compute gradients on CPU
           loss.backward()
           loss_list.append(loss.tolist())

           # Run the backward pass and optimizer step on RDU and note the grad_of_outputs
           # and section_types arguments
           samba.session.run(
               input_tensors=(images,),
               output_tensors=model.output_tensors,
               hyperparam_dict=hyperparam_dict,
               grad_of_outputs=[samba.from_torch_tensor(outputs.grad)], # Bring the grads back from CPU to RDU
               section_types=["BCKWD", "OPT"])

           # Compute and track the accuracy
           total = labels.size(0)
           _, predicted = torch.max(outputs.data, 1)
           correct = (predicted == labels).sum().item()
           acc_list.append(correct / total)

           if (i + 1) % 100 == 0:
               print(
                   "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                       epoch + 1,
                       args.num_epochs,
                       i + 1,
                       total_step,
                       torch.mean(loss),
                       (correct / total) * 100,
                   )
               )

main() used with external loss function

We only have to make small changes to the main() function.

Define the loss function. This could be a built-in PyTorch loss function or a user-defined function. In this example, we call it criterion.
Pass this loss function to the training function.

Compare this to the original main() function in Tie the pieces together with main().

main()

def main(argv):

   args = parse_app_args(argv=argv, common_parser_fn=add_user_args)

   # Create the CNN model
   model = ConvNet()

   # Convert model to SambaFlow (SambaTensors)
   samba.from_torch_model_(model)

   # Create optimizer
   # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
   optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate)

   ###################################################################
   # Define loss function here to be used in the forward pass on CPU #
   ###################################################################
   criterion = nn.CrossEntropyLoss()

   # Create dummy SambaTensor for graph tracing
   inputs = get_inputs(args)

   # The common_app_driver() handles model compilation and various other tasks, e.g.,
   # measure-performance.  Running, or training, a model must be explicitly carried out
   if args.command == "run":
       utils.trace_graph(model, inputs, optimizer, init_output_grads=not args.inference, pef=args.pef, mapping=args.mapping)
       train(args, model, criterion)
   else:
       common_app_driver(args=args,
                       model=model,
                       inputs=inputs,
                       optim=optimizer,
                       name=model.__class__.__name__,
                       init_output_grads=not args.inference,
                       app_dir=utils.get_file_dir(__file__))

How to compile and run the model with the loss function

The commands for compiling and running a model are the same for a model with an external loss function and with a loss function included in of the model. The models are functionally the same, so the commands don’t change. See Compile and run the model.

For some background on the SambaNova compile-run cycle, see Hello SambaFlow! Compile and run a model

Model conversion tips and tricks

This section, to be expanded, offers some tips and tricks for model conversion.

Torch Dataloaders. If the last batch’s length is not exactly divisible by your batch size, for example, if the size of the last batch is 28 and your PEF batch size is 32, compilation fails with a PEF mismatch error. Set the parameter drop_last_one=True to avoid that problem.
Data Visualization. SambaNova recommends that you don’t do data visualization directly on a SambaNova system.

Original and converted model code download

This tutorial explains code modifications using a simple example. The model is a 2-layer Convolutional Neural Network.

You can download the original code from this repo: https://github.com/adventuresinML/adventures-in-ml-code/blob/master/conv_net_py_torch.py.

The revised code is available for download. There are two examples with different loss functions.

Included loss function

import sambaflow
import sambaflow.samba as samba
import sambaflow.samba.optim as optim
import sambaflow.samba.utils as utils
from sambaflow.samba.utils.common import common_app_driver
from sambaflow.samba.utils.argparser import parse_app_args
from sambaflow.samba.sambaloader import SambaLoader

import sys
import argparse
from typing import Tuple

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

class ConvNet(nn.Module):
    """
    Instantiate a 4-layer CNN for MNIST Image Classification.

    In SambaFlow, it is possible to include a loss function as part of a model's definition and put it in
    the forward method to be computed.

    Typical SambaFlow usage example:

    model = ConvNet()
    samba.from_torch_model_(model)
    optimizer = ...
    inputs = ...
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    """

    def __init__(self):

        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(7 * 7 * 64, 1000)
        self.fc2 = nn.Linear(1000, 10)
        self.criterion = nn.CrossEntropyLoss() # Add loss function to model

    def forward(self, x: torch.Tensor, labels: torch.Tensor):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        loss = self.criterion(out, labels)     # Compute loss
        return loss, out

def add_user_args(parser: argparse.ArgumentParser) -> None:
    """
    Add user-defined arguments.

    Args:
        parser (argparse.ArgumentParser): SambaFlow argument parser
    """

    parser.add_argument(
        "-bs",
        type=int,
        default=100,
        metavar="N",
        help="input batch size for training (default: 100)",
    )
    parser.add_argument(
        "--num-epochs",
        type=int,
        default=6,
        metavar="N",
        help="number of epochs to train (default: 6)",
    )
    parser.add_argument(
        "--num-classes",
        type=int,
        default=10,
        metavar="N",
        help="number of classes in dataset (default: 10)",
    )
    parser.add_argument(
        "--learning-rate",
        type=float,
        default=0.001,
        metavar="LR",
        help="learning rate (default: 0.001)",
    )
    parser.add_argument(
        "--data-path",
        type=str,
        default="data",
        help="Download location for MNIST data",
    )
    parser.add_argument(
        "--model-path", type=str, default="model", help="Save location for model"
    )

def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]:
    """
    Generates random SambaTensors in the same shape as MNIST image  and label tensors.

    In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that
    is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A tuple of SambaTensors with random values in the same shape as MNIST image and label tensors.
    """

    dummy_image = (
        samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0),
        samba.randint(args.num_classes, (args.bs,), name="label", batch_dim=0),
    )

    return dummy_image

def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, sambaflow.samba.sambaloader.SambaLoader]:
    """
    Transforms MNIST input to tensors and creates training/test dataloaders.

    Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
    data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
    SambaLoaders.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A tuple of SambaLoaders over the training and test sets.
    """

    # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
    transform = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,)),
        ]
    )

    # Get the train & test data (images and labels) from the MNIST dataset
    train_dataset = datasets.MNIST(
        root=args.data_path,
        train=True,
        transform=transform,
        download=True,
    )
    test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)

    # Set up the train & test data loaders (input pipeline)
    train_loader = DataLoader(
        dataset=train_dataset, batch_size=args.bs, shuffle=True
    )
    test_loader = DataLoader(
        dataset=test_dataset, batch_size=args.bs, shuffle=False
    )

    # Create SambaLoaders
    sn_train_loader = SambaLoader(train_loader, ["image", "label"])
    sn_test_loader = SambaLoader(test_loader, ["image", "label"])

    return sn_train_loader, sn_test_loader

def train(args: argparse.Namespace, model: nn.Module) -> None:
    """
    Trains the model.

    Prepares and loads the data, then runs the training loop with the hyperparameters specified
    by the input arguments.  Calculates loss and accuracy over the course of training.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments
        model (nn.Module): ConvNet model
    """

    sn_train_loader, _ = prepare_dataloader(args)
    hyperparam_dict = {"lr": args.learning_rate}

    total_step = len(sn_train_loader)
    loss_list = []
    acc_list = []

    for epoch in range(args.num_epochs):
        for i, (images, labels) in enumerate(sn_train_loader):

            # Run the model on RDU: forward -> loss/gradients -> backward/optimizer
            loss, outputs = samba.session.run(
                input_tensors=(images, labels),
                output_tensors=model.output_tensors,
                hyperparam_dict=hyperparam_dict
            )

            # Convert SambaTensors back to Torch Tensors to calculate accuracy
            loss, outputs = samba.to_torch(loss), samba.to_torch(outputs)
            loss_list.append(loss.tolist())

            # Track the accuracy
            total = labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            acc_list.append(correct / total)

            if (i + 1) % 100 == 0:
                print(
                    "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                        epoch + 1,
                        args.num_epochs,
                        i + 1,
                        total_step,
                        torch.mean(loss),
                        (correct / total) * 100,
                    )
                )

def main(argv):

    args = parse_app_args(argv=argv, common_parser_fn=add_user_args)

    # Create the CNN model
    model = ConvNet()

    # Convert model to SambaFlow (SambaTensors)
    samba.from_torch_model_(model)

    # Create optimizer
    # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
    optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate)

    # Normally, we'd define a loss function here, but with SambaFlow, it can be defined
    # as part of the model, which we have done in this case

    # Create dummy SambaTensor for graph tracing
    inputs = get_inputs(args)

    # The common_app_driver() handles model compilation and various other tasks, e.g.,
    # measure-performance.  Running, or training, a model must be explicitly carried out
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    else:
        common_app_driver(args=args,
                        model=model,
                        inputs=inputs,
                        optim=optimizer,
                        name=model.__class__.__name__,
                        init_output_grads=not args.inference,
                        app_dir=utils.get_file_dir(__file__))

if __name__ == '__main__':
    main(sys.argv[1:])

Custom loss function

import sambaflow
import sambaflow.samba as samba
import sambaflow.samba.optim as optim
import sambaflow.samba.utils as utils
from sambaflow.samba.utils.common import common_app_driver
from sambaflow.samba.utils.argparser import parse_app_args
from sambaflow.samba.sambaloader import SambaLoader

import sys
import argparse
from typing import (Tuple, Callable)

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

class ConvNetCustomLoss(nn.Module):
    """
    Instantiate a 4-layer CNN for MNIST Image Classification.

    In SambaFlow, while it is possible to include a loss function in the model definition, it
    is not done here as an example of how to compute loss on the host.

    Typical SambaFlow usage example:

    model = ConvNet()
    samba.from_torch_(model)
    optimizer = ...
    inputs = ...
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    """

    def __init__(self):

        super(ConvNetCustomLoss, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(7 * 7 * 64, 1000)
        self.fc2 = nn.Linear(1000, 10)

    def forward(self, x: torch.Tensor):
        # Since loss isn't part of the model, we don't pass a label to forward()
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

def add_user_args(parser: argparse.ArgumentParser) -> None:
    """
    Add user-defined arguments.

    Args:
        parser (argparse.ArgumentParser): SambaFlow argument parser
    """

    parser.add_argument(
        "-bs",
        type=int,
        default=100,
        metavar="N",
        help="input batch size for training (default: 100)",
    )
    parser.add_argument(
        "--num-epochs",
        type=int,
        default=6,
        metavar="N",
        help="number of epochs to train (default: 6)",
    )
    parser.add_argument(
        "--num-classes",
        type=int,
        default=10,
        metavar="N",
        help="number of classes in dataset (default: 10)",
    )
    parser.add_argument(
        "--learning-rate",
        type=float,
        default=0.001,
        metavar="LR",
        help="learning rate (default: 0.001)",
    )
    parser.add_argument(
        "--data-path",
        type=str,
        default="data",
        help="Download location for MNIST data",
    )
    parser.add_argument(
        "--model-path", type=str, default="model", help="Save location for model"
    )

def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]:
    """
    Generates random SambaTensors in the same shape as MNIST image tensors.

    In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that
    is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A SambaTensor with random values in the same shape as MNIST image tensors.
    """

    # Loss is computed on the host, so a dummy SambaTensor is only needed for the MNIST images
    return samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0),

def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, ...]:
    """
    Transforms MNIST input to tensors and creates training/test dataloaders.

    Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
    data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
    SambaLoaders.

    Input:
        args: User- and system-defined command line arguments

    Returns:
        A tuple of SambaLoaders over the training and test sets.
    """

    # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
    transform = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,)),
        ]
    )

    # Get the train & test data (images and labels) from the MNIST dataset
    train_dataset = datasets.MNIST(
        root=args.data_path,
        train=True,
        transform=transform,
        download=True,
    )
    test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)

    # Set up the train & test data loaders (input pipeline)
    train_loader = DataLoader(
        dataset=train_dataset, batch_size=args.bs, shuffle=True
    )
    test_loader = DataLoader(
        dataset=test_dataset, batch_size=args.bs, shuffle=False
    )

    # Create SambaLoaders
    # function_hook allows us to specify which tensor(s) should be passed along to the model
    #  -> The hook must return a list containing the same number of tensors as specified in the list of names
    #  -> Any other tensors will be filtered out, so if you need those, then...
    # return_original_batch allows us to retain the original input tensors for later processing, e.g., computing loss
    #  -> It causes the SambaLoader to also return a list of the original input tensors
    sn_train_loader = SambaLoader(dataloader=train_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)
    sn_test_loader = SambaLoader(dataloader=test_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)

    return sn_train_loader, sn_test_loader

def train(args: argparse.Namespace, model: nn.Module, criterion: Callable) -> None:
    """
    Trains the model.

    Prepares and loads the data, then runs the training loop with the hyperparameters specified
    by the input arguments with a given loss function.  Calculates loss and accuracy over the course of training.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments
        model (nn.Module): ConvNet model
        criterion (Callable): Loss function
    """

    sn_train_loader, sn_test_loader = prepare_dataloader(args)
    hyperparam_dict = {"lr": args.learning_rate}

    total_step = len(sn_train_loader)
    loss_list = []
    acc_list = []

    for epoch in range(args.num_epochs):
        for i, (images, original_batch) in enumerate(sn_train_loader):

            # The label tensor is the second element of the original batch
            labels = original_batch[1]

            # Run only the forward pass on RDU and note the section_types argument
            # The first element of the returned tuple contains the raw outputs of forward()
            outputs = samba.session.run(
                input_tensors=(images,),
                output_tensors=model.output_tensors,
                hyperparam_dict=hyperparam_dict,
                section_types=["FWD"]
            )[0]

            # Convert SambaTensors back to Torch Tensors to carry out loss calculation
            # on the host CPU.  Be sure to set the requires_grad attribute for PyTorch.
            outputs = samba.to_torch(outputs)
            outputs.requires_grad = True

            # Compute loss on host CPU and store it for later tracking
            loss = criterion(outputs, labels)

            # Compute gradients on CPU
            loss.backward()
            loss_list.append(loss.tolist())

            # Run the backward pass and optimizer step on RDU and note the grad_of_outputs
            # and section_types arguments
            samba.session.run(
                input_tensors=(images,),
                output_tensors=model.output_tensors,
		        hyperparam_dict=hyperparam_dict,
		        grad_of_outputs=[samba.from_torch_tensor(outputs.grad)], # Bring the grads back from CPU to RDU
                section_types=["BCKWD", "OPT"])

            # Compute and track the accuracy
            total = labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            acc_list.append(correct / total)

            if (i + 1) % 100 == 0:
                print(
                    "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                        epoch + 1,
                        args.num_epochs,
                        i + 1,
                        total_step,
                        torch.mean(loss),
                        (correct / total) * 100,
                    )
                )

def main(argv):

    args = parse_app_args(argv=argv, common_parser_fn=add_user_args)

    # Create the CNN model
    model = ConvNetCustomLoss()

    # Convert model to SambaFlow (SambaTensors)
    samba.from_torch_model_(model)

    # Create optimizer
    # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
    optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate)

    ###################################################################
    # Define loss function here to be used in the forward pass on CPU #
    ###################################################################
    criterion = nn.CrossEntropyLoss()

    # Create dummy SambaTensor for graph tracing
    inputs = get_inputs(args)

    # The common_app_driver() handles model compilation and various other tasks, e.g.,
    # measure-performance.  Running, or training, a model must be explicitly carried out
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, init_output_grads=not args.inference, pef=args.pef, mapping=args.mapping)
        train(args, model, criterion)
    else:
        common_app_driver(args=args,
                        model=model,
                        inputs=inputs,
                        optim=optimizer,
                        name=model.__class__.__name__,
                        init_output_grads=not args.inference,
                        app_dir=utils.get_file_dir(__file__))

if __name__ == '__main__':
    main(sys.argv[1:])

Learn more!

To understand what the messages to stdout mean, see SambaNova messages and logs.
To learn how to run models inside a Python virtual environment, see Use Python virtual environments.
For a list of supported Pytorch operators, see SambaNova Pytorch operator support