Convert a simple model to SambaFlow

Many SambaNova customers convert an existing model that they built in PyTorch to SambaFlow. This doc page uses a simple example to illustrate what is essential for the conversion and discusses some best practices. You’ll see that much of your code remains unchanged and that SambaFlow doesn’t usually require you to reformat your data.

This tutorial is about model conversion. For background on data preparation, see our public GitHub repository

In this tutorial, you:

Learn about The example model.
Look at some Planning questions that will help you be successful.
Learn about required and optional changes to the model PyTorch code.
- Code changes if the model includes a loss function are in Examine functions and changes
- Additional code changes if the model uses an external a loss function are in Examine model code with external loss function.

The example model

Convolutional Neural Networks (CNNs) are a popular model type in the Visual AI space. Our example model is a CNN that performs image classification on the MNIST dataset. It consists of four layers:

2 Convolutional layers, each containing a:
- Conv2D
- ReLU
- MaxPool2D
2 Fully-connected linear layers

Included or external loss function

This conversion example presents two example solutions:

The solution in Examine functions and changes includes the model’s loss function as part of the model definition.
- This approach results in performance enhancements because loss computation happens on RDU.
- In the example, the loss function is included in the forward() function.
The solution in Examine model code with external loss function includes code for a loss function that is external to the model.
- This solution uses a host CPU to compute the loss and gradients for backpropagation.
- Use this approach if your model’s loss function isn’t currently supported by SambaFlow or if you are using a custom loss function.

Original and converted model code

This tutorial explains code modifications using a simple 2-layer Convolutional Neural Network example. We picked this example because it’s simple and compiles quickly.

You can download the original code from this repo: https://github.com/adventuresinML/adventures-in-ml-code/blob/master/conv_net_py_torch.py.

The revised code is available below.

Included loss function

import sambaflow
import sambaflow.samba as samba
import sambaflow.samba.optim as optim
import sambaflow.samba.utils as utils
from sambaflow.samba.utils.common import common_app_driver
from sambaflow.samba.utils.argparser import parse_app_args
from sambaflow.samba.sambaloader import SambaLoader

import sys
import argparse
from typing import Tuple

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

class ConvNet(nn.Module):
    """
    Instantiate a 4-layer CNN for MNIST Image Classification.

    In SambaFlow, it is possible to include a loss function as part of a model's definition and put it in
    the forward method to be computed.

    Typical SambaFlow usage example:

    model = ConvNet()
    samba.from_torch_model_(model)
    optimizer = ...
    inputs = ...
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    """

    def __init__(self):

        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(7 * 7 * 64, 1000)
        self.fc2 = nn.Linear(1000, 10)
        self.criterion = nn.CrossEntropyLoss() # Add loss function to model

    def forward(self, x: torch.Tensor, labels: torch.Tensor):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        loss = self.criterion(out, labels)     # Compute loss
        return loss, out

def add_user_args(parser: argparse.ArgumentParser) -> None:
    """
    Add user-defined arguments.

    Args:
        parser (argparse.ArgumentParser): SambaFlow argument parser
    """

    parser.add_argument(
        "-bs",
        type=int,
        default=100,
        metavar="N",
        help="input batch size for training (default: 100)",
    )
    parser.add_argument(
        "--num-epochs",
        type=int,
        default=6,
        metavar="N",
        help="number of epochs to train (default: 6)",
    )
    parser.add_argument(
        "--num-classes",
        type=int,
        default=10,
        metavar="N",
        help="number of classes in dataset (default: 10)",
    )
    parser.add_argument(
        "--learning-rate",
        type=float,
        default=0.001,
        metavar="LR",
        help="learning rate (default: 0.001)",
    )
    parser.add_argument(
        "--data-path",
        type=str,
        default="data",
        help="Download location for MNIST data",
    )
    parser.add_argument(
        "--model-path", type=str, default="model", help="Save location for model"
    )

def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]:
    """
    Generates random SambaTensors in the same shape as MNIST image  and label tensors.

    In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that
    is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A tuple of SambaTensors with random values in the same shape as MNIST image and label tensors.
    """

    dummy_image = (
        samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0),
        samba.randint(args.num_classes, (args.bs,), name="label", batch_dim=0),
    )

    return dummy_image

def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, sambaflow.samba.sambaloader.SambaLoader]:
    """
    Transforms MNIST input to tensors and creates training/test dataloaders.

    Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
    data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
    SambaLoaders.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A tuple of SambaLoaders over the training and test sets.
    """

    # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
    transform = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,)),
        ]
    )

    # Get the train & test data (images and labels) from the MNIST dataset
    train_dataset = datasets.MNIST(
        root=args.data_path,
        train=True,
        transform=transform,
        download=True,
    )
    test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)

    # Set up the train & test data loaders (input pipeline)
    train_loader = DataLoader(
        dataset=train_dataset, batch_size=args.bs, shuffle=True
    )
    test_loader = DataLoader(
        dataset=test_dataset, batch_size=args.bs, shuffle=False
    )

    # Create SambaLoaders
    sn_train_loader = SambaLoader(train_loader, ["image", "label"])
    sn_test_loader = SambaLoader(test_loader, ["image", "label"])

    return sn_train_loader, sn_test_loader

def train(args: argparse.Namespace, model: nn.Module) -> None:
    """
    Trains the model.

    Prepares and loads the data, then runs the training loop with the hyperparameters specified
    by the input arguments.  Calculates loss and accuracy over the course of training.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments
        model (nn.Module): ConvNet model
    """

    sn_train_loader, _ = prepare_dataloader(args)
    hyperparam_dict = {"lr": args.learning_rate}

    total_step = len(sn_train_loader)
    loss_list = []
    acc_list = []

    for epoch in range(args.num_epochs):
        for i, (images, labels) in enumerate(sn_train_loader):

            # Run the model on RDU: forward -> loss/gradients -> backward/optimizer
            loss, outputs = samba.session.run(
                input_tensors=(images, labels),
                output_tensors=model.output_tensors,
                hyperparam_dict=hyperparam_dict
            )

            # Convert SambaTensors back to Torch Tensors to calculate accuracy
            loss, outputs = samba.to_torch(loss), samba.to_torch(outputs)
            loss_list.append(loss.tolist())

            # Track the accuracy
            total = labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            acc_list.append(correct / total)

            if (i + 1) % 100 == 0:
                print(
                    "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                        epoch + 1,
                        args.num_epochs,
                        i + 1,
                        total_step,
                        torch.mean(loss),
                        (correct / total) * 100,
                    )
                )

def main(argv):

    args = parse_app_args(argv=argv, common_parser_fn=add_user_args)

    # Create the CNN model
    model = ConvNet()

    # Convert model to SambaFlow (SambaTensors)
    samba.from_torch_model_(model)

    # Create optimizer
    # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
    optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate)

    # Normally, we'd define a loss function here, but with SambaFlow, it can be defined
    # as part of the model, which we have done in this case

    # Create dummy SambaTensor for graph tracing
    inputs = get_inputs(args)

    # The common_app_driver() handles model compilation and various other tasks, e.g.,
    # measure-performance.  Running, or training, a model must be explicitly carried out
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    else:
        common_app_driver(args=args,
                        model=model,
                        inputs=inputs,
                        optim=optimizer,
                        name=model.__class__.__name__,
                        init_output_grads=not args.inference,
                        app_dir=utils.get_file_dir(__file__))

if __name__ == '__main__':
    main(sys.argv[1:])

Custom loss function

import sambaflow
import sambaflow.samba as samba
import sambaflow.samba.optim as optim
import sambaflow.samba.utils as utils
from sambaflow.samba.utils.common import common_app_driver
from sambaflow.samba.utils.argparser import parse_app_args
from sambaflow.samba.sambaloader import SambaLoader

import sys
import argparse
from typing import (Tuple, Callable)

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

class ConvNetCustomLoss(nn.Module):
    """
    Instantiate a 4-layer CNN for MNIST Image Classification.

    In SambaFlow, while it is possible to include a loss function in the model definition, it
    is not done here as an example of how to compute loss on the host.

    Typical SambaFlow usage example:

    model = ConvNet()
    samba.from_torch_(model)
    optimizer = ...
    inputs = ...
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
        train(args, model)
    """

    def __init__(self):

        super(ConvNetCustomLoss, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(7 * 7 * 64, 1000)
        self.fc2 = nn.Linear(1000, 10)

    def forward(self, x: torch.Tensor):
        # Since loss isn't part of the model, we don't pass a label to forward()
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

def add_user_args(parser: argparse.ArgumentParser) -> None:
    """
    Add user-defined arguments.

    Args:
        parser (argparse.ArgumentParser): SambaFlow argument parser
    """

    parser.add_argument(
        "-bs",
        type=int,
        default=100,
        metavar="N",
        help="input batch size for training (default: 100)",
    )
    parser.add_argument(
        "--num-epochs",
        type=int,
        default=6,
        metavar="N",
        help="number of epochs to train (default: 6)",
    )
    parser.add_argument(
        "--num-classes",
        type=int,
        default=10,
        metavar="N",
        help="number of classes in dataset (default: 10)",
    )
    parser.add_argument(
        "--learning-rate",
        type=float,
        default=0.001,
        metavar="LR",
        help="learning rate (default: 0.001)",
    )
    parser.add_argument(
        "--data-path",
        type=str,
        default="data",
        help="Download location for MNIST data",
    )
    parser.add_argument(
        "--model-path", type=str, default="model", help="Save location for model"
    )

def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]:
    """
    Generates random SambaTensors in the same shape as MNIST image tensors.

    In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that
    is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments

    Returns:
        A SambaTensor with random values in the same shape as MNIST image tensors.
    """

    # Loss is computed on the host, so a dummy SambaTensor is only needed for the MNIST images
    return samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0),

def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, ...]:
    """
    Transforms MNIST input to tensors and creates training/test dataloaders.

    Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
    data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
    SambaLoaders.

    Input:
        args: User- and system-defined command line arguments

    Returns:
        A tuple of SambaLoaders over the training and test sets.
    """

    # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
    transform = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,)),
        ]
    )

    # Get the train & test data (images and labels) from the MNIST dataset
    train_dataset = datasets.MNIST(
        root=args.data_path,
        train=True,
        transform=transform,
        download=True,
    )
    test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)

    # Set up the train & test data loaders (input pipeline)
    train_loader = DataLoader(
        dataset=train_dataset, batch_size=args.bs, shuffle=True
    )
    test_loader = DataLoader(
        dataset=test_dataset, batch_size=args.bs, shuffle=False
    )

    # Create SambaLoaders
    # function_hook allows us to specify which tensor(s) should be passed along to the model
    #  -> The hook must return a list containing the same number of tensors as specified in the list of names
    #  -> Any other tensors will be filtered out, so if you need those, then...
    # return_original_batch allows us to retain the original input tensors for later processing, e.g., computing loss
    #  -> It causes the SambaLoader to also return a list of the original input tensors
    sn_train_loader = SambaLoader(dataloader=train_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)
    sn_test_loader = SambaLoader(dataloader=test_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)

    return sn_train_loader, sn_test_loader

def train(args: argparse.Namespace, model: nn.Module, criterion: Callable) -> None:
    """
    Trains the model.

    Prepares and loads the data, then runs the training loop with the hyperparameters specified
    by the input arguments with a given loss function.  Calculates loss and accuracy over the course of training.

    Args:
        args (argparse.Namespace): User- and system-defined command line arguments
        model (nn.Module): ConvNet model
        criterion (Callable): Loss function
    """

    sn_train_loader, sn_test_loader = prepare_dataloader(args)
    hyperparam_dict = {"lr": args.learning_rate}

    total_step = len(sn_train_loader)
    loss_list = []
    acc_list = []

    for epoch in range(args.num_epochs):
        for i, (images, original_batch) in enumerate(sn_train_loader):

            # The label tensor is the second element of the original batch
            labels = original_batch[1]

            # Run only the forward pass on RDU and note the section_types argument
            # The first element of the returned tuple contains the raw outputs of forward()
            outputs = samba.session.run(
                input_tensors=(images,),
                output_tensors=model.output_tensors,
                hyperparam_dict=hyperparam_dict,
                section_types=["FWD"]
            )[0]

            # Convert SambaTensors back to Torch Tensors to carry out loss calculation
            # on the host CPU.  Be sure to set the requires_grad attribute for PyTorch.
            outputs = samba.to_torch(outputs)
            outputs.requires_grad = True

            # Compute loss on host CPU and store it for later tracking
            loss = criterion(outputs, labels)

            # Compute gradients on CPU
            loss.backward()
            loss_list.append(loss.tolist())

            # Run the backward pass and optimizer step on RDU and note the grad_of_outputs
            # and section_types arguments
            samba.session.run(
                input_tensors=(images,),
                output_tensors=model.output_tensors,
		        hyperparam_dict=hyperparam_dict,
		        grad_of_outputs=[samba.from_torch_tensor(outputs.grad)], # Bring the grads back from CPU to RDU
                section_types=["BCKWD", "OPT"])

            # Compute and track the accuracy
            total = labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            acc_list.append(correct / total)

            if (i + 1) % 100 == 0:
                print(
                    "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                        epoch + 1,
                        args.num_epochs,
                        i + 1,
                        total_step,
                        torch.mean(loss),
                        (correct / total) * 100,
                    )
                )

def main(argv):

    args = parse_app_args(argv=argv, common_parser_fn=add_user_args)

    # Create the CNN model
    model = ConvNetCustomLoss()

    # Convert model to SambaFlow (SambaTensors)
    samba.from_torch_model_(model)

    # Create optimizer
    # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
    optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate)

    ###################################################################
    # Define loss function here to be used in the forward pass on CPU #
    ###################################################################
    criterion = nn.CrossEntropyLoss()

    # Create dummy SambaTensor for graph tracing
    inputs = get_inputs(args)

    # The common_app_driver() handles model compilation and various other tasks, e.g.,
    # measure-performance.  Running, or training, a model must be explicitly carried out
    if args.command == "run":
        utils.trace_graph(model, inputs, optimizer, init_output_grads=not args.inference, pef=args.pef, mapping=args.mapping)
        train(args, model, criterion)
    else:
        common_app_driver(args=args,
                        model=model,
                        inputs=inputs,
                        optim=optimizer,
                        name=model.__class__.__name__,
                        init_output_grads=not args.inference,
                        app_dir=utils.get_file_dir(__file__))

if __name__ == '__main__':
    main(sys.argv[1:])

Planning questions

To make the conversion process more straightforward, consider these planning questions.

Where are my data loaders?

All models need data and one of the easiest ways to feed in that data is with a PyTorch DataLoader. The output tensors that come from the DataLoader need to be converted into SambaTensors. See Prepare data loader.
What shape are my input tensors?

When you compile a SambaFlow model, the compute graph of your model is physically mapped onto an RDU. To perform this mapping, SambaFlow needs to know the shape of the input tensors. See Generate tensors.
Where is my model defined?

A useful feature of SambaFlow is that an optimizer can be included in the definition and forward section of a model. An optimizer can be mapped directly onto an RDU, greatly enhancing performance. See Define the model.
Where is my model instantiated?

The model must be explicitly converted to SambaFlow. Fortunately, only a single SambaFlow method needs to be used to do that. See Tie it all together with main().
Where is my loss function defined and what is it?

A loss function can be a part of a model’s definition. So, if your model uses a PyTorch loss function that SambaFlow supports, the function can be moved, as in Define the model. If your model doesn’t use a supported loss function it can be used externally. See Examine model code with external loss function.
Where is my optimizer defined and what is it?

Unlike loss functions, optimizers can’t be added directly to a model’s definition in SambaFlow. Loss functions are passed into SambaFlow during compilation and training. See Tie it all together with main().

Compile and run the model

To compile a model, you always use the following syntax:

$ python <model>.py compile --pef-name <pef_name>

Assuming you’ve saved the example code as cnn_conversion.py, run the following command.

$ python cnn_conversion.py compile --pef-name cnn_conversion.pef

To run the model, you pass in the PEF file that was generated during compilation. The syntax is:

$ python <model>.py run --pef <pef_name>

For this example, run the following command:

$ python cnn_conversion.py run --pef cnn_conversion.pef

See Compile and run your first model for details

Model conversion tips and tricks

This section offers some tips and tricks for model conversion.

Torch Dataloaders. If the last batch’s length is not exactly equal to your batch size, for example, if the size of the last batch is 28 and your PEF batch size is 32, compilation fails with a PEF mismatch error. Set the parameter drop_last_one=True to avoid that problem.
Data Visualization. SambaNova recommends that you don’t do data visualization directly on a SambaNova system.

Learn more!

To understand what the messages to stdout mean, see SambaNova messages and logs.
To learn how to run models inside a Python virtual environment, see Use Python virtual environments.
For information about supported PyTorch operators, see the SambaFlow API Reference .