Convert a simple model to SambaFlow
Many SambaNova customers convert an existing model that they built in PyTorch to SambaFlow. This doc page uses a simple example to illustrate what is essential for the conversion and discusses some best practices. You’ll see that much of your code remains unchanged and that SambaFlow doesn’t usually require you to reformat your data.
This tutorial is about model conversion. For background on data preparation, see our public GitHub repository . |
In this tutorial, you:
-
Learn about The example model.
-
Look at some Planning questions that will help you be successful.
-
Learn about required and optional changes to the model PyTorch code.
-
Code changes if the model includes a loss function are in Examine functions and changes
-
Additional code changes if the model uses an external a loss function are in Examine model code with external loss function.
-
The example model
Convolutional Neural Networks (CNNs) are a popular model type in the Visual AI space. Our example model is a CNN that performs image classification on the MNIST dataset. It consists of four layers:
-
2 Convolutional layers, each containing a:
-
Conv2D
-
ReLU
-
MaxPool2D
-
-
2 Fully-connected linear layers
Included or external loss function
This conversion example presents two example solutions:
-
The solution in Examine functions and changes includes the model’s loss function as part of the model definition.
-
This approach results in performance enhancements because loss computation happens on RDU.
-
In the example, the loss function is included in the
forward()
function.
-
-
The solution in Examine model code with external loss function includes code for a loss function that is external to the model.
-
This solution uses a host CPU to compute the loss and gradients for backpropagation.
-
Use this approach if your model’s loss function isn’t currently supported by SambaFlow or if you are using a custom loss function.
-
Original and converted model code
This tutorial explains code modifications using a simple 2-layer Convolutional Neural Network example. We picked this example because it’s simple and compiles quickly.
-
You can download the original code from this repo: https://github.com/adventuresinML/adventures-in-ml-code/blob/master/conv_net_py_torch.py.
-
The revised code is available below.
Included loss function
import sambaflow import sambaflow.samba as samba import sambaflow.samba.optim as optim import sambaflow.samba.utils as utils from sambaflow.samba.utils.common import common_app_driver from sambaflow.samba.utils.argparser import parse_app_args from sambaflow.samba.sambaloader import SambaLoader import sys import argparse from typing import Tuple import numpy as np import torch import torch.nn as nn from torch.utils.data import DataLoader from torchvision import datasets, transforms class ConvNet(nn.Module): """ Instantiate a 4-layer CNN for MNIST Image Classification. In SambaFlow, it is possible to include a loss function as part of a model's definition and put it in the forward method to be computed. Typical SambaFlow usage example: model = ConvNet() samba.from_torch_model_(model) optimizer = ... inputs = ... if args.command == "run": utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping) train(args, model) """ def __init__(self): super(ConvNet, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), ) self.layer2 = nn.Sequential( nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), ) self.drop_out = nn.Dropout() self.fc1 = nn.Linear(7 * 7 * 64, 1000) self.fc2 = nn.Linear(1000, 10) self.criterion = nn.CrossEntropyLoss() # Add loss function to model def forward(self, x: torch.Tensor, labels: torch.Tensor): out = self.layer1(x) out = self.layer2(out) out = out.reshape(out.size(0), -1) out = self.drop_out(out) out = self.fc1(out) out = self.fc2(out) loss = self.criterion(out, labels) # Compute loss return loss, out def add_user_args(parser: argparse.ArgumentParser) -> None: """ Add user-defined arguments. Args: parser (argparse.ArgumentParser): SambaFlow argument parser """ parser.add_argument( "-bs", type=int, default=100, metavar="N", help="input batch size for training (default: 100)", ) parser.add_argument( "--num-epochs", type=int, default=6, metavar="N", help="number of epochs to train (default: 6)", ) parser.add_argument( "--num-classes", type=int, default=10, metavar="N", help="number of classes in dataset (default: 10)", ) parser.add_argument( "--learning-rate", type=float, default=0.001, metavar="LR", help="learning rate (default: 0.001)", ) parser.add_argument( "--data-path", type=str, default="data", help="Download location for MNIST data", ) parser.add_argument( "--model-path", type=str, default="model", help="Save location for model" ) def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]: """ Generates random SambaTensors in the same shape as MNIST image and label tensors. In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU. Args: args (argparse.Namespace): User- and system-defined command line arguments Returns: A tuple of SambaTensors with random values in the same shape as MNIST image and label tensors. """ dummy_image = ( samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0), samba.randint(args.num_classes, (args.bs,), name="label", batch_dim=0), ) return dummy_image def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, sambaflow.samba.sambaloader.SambaLoader]: """ Transforms MNIST input to tensors and creates training/test dataloaders. Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the data to tensors; then creates Torch DataLoaders over those sets. Torch DataLoaders are wrapped in SambaLoaders. Args: args (argparse.Namespace): User- and system-defined command line arguments Returns: A tuple of SambaLoaders over the training and test sets. """ # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors transform = transforms.Compose( [ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), ] ) # Get the train & test data (images and labels) from the MNIST dataset train_dataset = datasets.MNIST( root=args.data_path, train=True, transform=transform, download=True, ) test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform) # Set up the train & test data loaders (input pipeline) train_loader = DataLoader( dataset=train_dataset, batch_size=args.bs, shuffle=True ) test_loader = DataLoader( dataset=test_dataset, batch_size=args.bs, shuffle=False ) # Create SambaLoaders sn_train_loader = SambaLoader(train_loader, ["image", "label"]) sn_test_loader = SambaLoader(test_loader, ["image", "label"]) return sn_train_loader, sn_test_loader def train(args: argparse.Namespace, model: nn.Module) -> None: """ Trains the model. Prepares and loads the data, then runs the training loop with the hyperparameters specified by the input arguments. Calculates loss and accuracy over the course of training. Args: args (argparse.Namespace): User- and system-defined command line arguments model (nn.Module): ConvNet model """ sn_train_loader, _ = prepare_dataloader(args) hyperparam_dict = {"lr": args.learning_rate} total_step = len(sn_train_loader) loss_list = [] acc_list = [] for epoch in range(args.num_epochs): for i, (images, labels) in enumerate(sn_train_loader): # Run the model on RDU: forward -> loss/gradients -> backward/optimizer loss, outputs = samba.session.run( input_tensors=(images, labels), output_tensors=model.output_tensors, hyperparam_dict=hyperparam_dict ) # Convert SambaTensors back to Torch Tensors to calculate accuracy loss, outputs = samba.to_torch(loss), samba.to_torch(outputs) loss_list.append(loss.tolist()) # Track the accuracy total = labels.size(0) _, predicted = torch.max(outputs.data, 1) correct = (predicted == labels).sum().item() acc_list.append(correct / total) if (i + 1) % 100 == 0: print( "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format( epoch + 1, args.num_epochs, i + 1, total_step, torch.mean(loss), (correct / total) * 100, ) ) def main(argv): args = parse_app_args(argv=argv, common_parser_fn=add_user_args) # Create the CNN model model = ConvNet() # Convert model to SambaFlow (SambaTensors) samba.from_torch_model_(model) # Create optimizer # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate) # Normally, we'd define a loss function here, but with SambaFlow, it can be defined # as part of the model, which we have done in this case # Create dummy SambaTensor for graph tracing inputs = get_inputs(args) # The common_app_driver() handles model compilation and various other tasks, e.g., # measure-performance. Running, or training, a model must be explicitly carried out if args.command == "run": utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping) train(args, model) else: common_app_driver(args=args, model=model, inputs=inputs, optim=optimizer, name=model.__class__.__name__, init_output_grads=not args.inference, app_dir=utils.get_file_dir(__file__)) if __name__ == '__main__': main(sys.argv[1:])
Custom loss function
import sambaflow import sambaflow.samba as samba import sambaflow.samba.optim as optim import sambaflow.samba.utils as utils from sambaflow.samba.utils.common import common_app_driver from sambaflow.samba.utils.argparser import parse_app_args from sambaflow.samba.sambaloader import SambaLoader import sys import argparse from typing import (Tuple, Callable) import numpy as np import torch import torch.nn as nn from torch.utils.data import DataLoader from torchvision import datasets, transforms class ConvNetCustomLoss(nn.Module): """ Instantiate a 4-layer CNN for MNIST Image Classification. In SambaFlow, while it is possible to include a loss function in the model definition, it is not done here as an example of how to compute loss on the host. Typical SambaFlow usage example: model = ConvNet() samba.from_torch_(model) optimizer = ... inputs = ... if args.command == "run": utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping) train(args, model) """ def __init__(self): super(ConvNetCustomLoss, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), ) self.layer2 = nn.Sequential( nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), ) self.drop_out = nn.Dropout() self.fc1 = nn.Linear(7 * 7 * 64, 1000) self.fc2 = nn.Linear(1000, 10) def forward(self, x: torch.Tensor): # Since loss isn't part of the model, we don't pass a label to forward() out = self.layer1(x) out = self.layer2(out) out = out.reshape(out.size(0), -1) out = self.drop_out(out) out = self.fc1(out) out = self.fc2(out) return out def add_user_args(parser: argparse.ArgumentParser) -> None: """ Add user-defined arguments. Args: parser (argparse.ArgumentParser): SambaFlow argument parser """ parser.add_argument( "-bs", type=int, default=100, metavar="N", help="input batch size for training (default: 100)", ) parser.add_argument( "--num-epochs", type=int, default=6, metavar="N", help="number of epochs to train (default: 6)", ) parser.add_argument( "--num-classes", type=int, default=10, metavar="N", help="number of classes in dataset (default: 10)", ) parser.add_argument( "--learning-rate", type=float, default=0.001, metavar="LR", help="learning rate (default: 0.001)", ) parser.add_argument( "--data-path", type=str, default="data", help="Download location for MNIST data", ) parser.add_argument( "--model-path", type=str, default="model", help="Save location for model" ) def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]: """ Generates random SambaTensors in the same shape as MNIST image tensors. In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU. Args: args (argparse.Namespace): User- and system-defined command line arguments Returns: A SambaTensor with random values in the same shape as MNIST image tensors. """ # Loss is computed on the host, so a dummy SambaTensor is only needed for the MNIST images return samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0), def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, ...]: """ Transforms MNIST input to tensors and creates training/test dataloaders. Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the data to tensors; then creates Torch DataLoaders over those sets. Torch DataLoaders are wrapped in SambaLoaders. Input: args: User- and system-defined command line arguments Returns: A tuple of SambaLoaders over the training and test sets. """ # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors transform = transforms.Compose( [ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), ] ) # Get the train & test data (images and labels) from the MNIST dataset train_dataset = datasets.MNIST( root=args.data_path, train=True, transform=transform, download=True, ) test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform) # Set up the train & test data loaders (input pipeline) train_loader = DataLoader( dataset=train_dataset, batch_size=args.bs, shuffle=True ) test_loader = DataLoader( dataset=test_dataset, batch_size=args.bs, shuffle=False ) # Create SambaLoaders # function_hook allows us to specify which tensor(s) should be passed along to the model # -> The hook must return a list containing the same number of tensors as specified in the list of names # -> Any other tensors will be filtered out, so if you need those, then... # return_original_batch allows us to retain the original input tensors for later processing, e.g., computing loss # -> It causes the SambaLoader to also return a list of the original input tensors sn_train_loader = SambaLoader(dataloader=train_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True) sn_test_loader = SambaLoader(dataloader=test_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True) return sn_train_loader, sn_test_loader def train(args: argparse.Namespace, model: nn.Module, criterion: Callable) -> None: """ Trains the model. Prepares and loads the data, then runs the training loop with the hyperparameters specified by the input arguments with a given loss function. Calculates loss and accuracy over the course of training. Args: args (argparse.Namespace): User- and system-defined command line arguments model (nn.Module): ConvNet model criterion (Callable): Loss function """ sn_train_loader, sn_test_loader = prepare_dataloader(args) hyperparam_dict = {"lr": args.learning_rate} total_step = len(sn_train_loader) loss_list = [] acc_list = [] for epoch in range(args.num_epochs): for i, (images, original_batch) in enumerate(sn_train_loader): # The label tensor is the second element of the original batch labels = original_batch[1] # Run only the forward pass on RDU and note the section_types argument # The first element of the returned tuple contains the raw outputs of forward() outputs = samba.session.run( input_tensors=(images,), output_tensors=model.output_tensors, hyperparam_dict=hyperparam_dict, section_types=["FWD"] )[0] # Convert SambaTensors back to Torch Tensors to carry out loss calculation # on the host CPU. Be sure to set the requires_grad attribute for PyTorch. outputs = samba.to_torch(outputs) outputs.requires_grad = True # Compute loss on host CPU and store it for later tracking loss = criterion(outputs, labels) # Compute gradients on CPU loss.backward() loss_list.append(loss.tolist()) # Run the backward pass and optimizer step on RDU and note the grad_of_outputs # and section_types arguments samba.session.run( input_tensors=(images,), output_tensors=model.output_tensors, hyperparam_dict=hyperparam_dict, grad_of_outputs=[samba.from_torch_tensor(outputs.grad)], # Bring the grads back from CPU to RDU section_types=["BCKWD", "OPT"]) # Compute and track the accuracy total = labels.size(0) _, predicted = torch.max(outputs.data, 1) correct = (predicted == labels).sum().item() acc_list.append(correct / total) if (i + 1) % 100 == 0: print( "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format( epoch + 1, args.num_epochs, i + 1, total_step, torch.mean(loss), (correct / total) * 100, ) ) def main(argv): args = parse_app_args(argv=argv, common_parser_fn=add_user_args) # Create the CNN model model = ConvNetCustomLoss() # Convert model to SambaFlow (SambaTensors) samba.from_torch_model_(model) # Create optimizer # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate) ################################################################### # Define loss function here to be used in the forward pass on CPU # ################################################################### criterion = nn.CrossEntropyLoss() # Create dummy SambaTensor for graph tracing inputs = get_inputs(args) # The common_app_driver() handles model compilation and various other tasks, e.g., # measure-performance. Running, or training, a model must be explicitly carried out if args.command == "run": utils.trace_graph(model, inputs, optimizer, init_output_grads=not args.inference, pef=args.pef, mapping=args.mapping) train(args, model, criterion) else: common_app_driver(args=args, model=model, inputs=inputs, optim=optimizer, name=model.__class__.__name__, init_output_grads=not args.inference, app_dir=utils.get_file_dir(__file__)) if __name__ == '__main__': main(sys.argv[1:])
Planning questions
To make the conversion process more straightforward, consider these planning questions.
-
Where are my data loaders?
All models need data and one of the easiest ways to feed in that data is with a PyTorch DataLoader. The output tensors that come from the DataLoader need to be converted into SambaTensors. See Prepare data loader.
-
What shape are my input tensors?
When you compile a SambaFlow model, the compute graph of your model is physically mapped onto an RDU. To perform this mapping, SambaFlow needs to know the shape of the input tensors. See Generate tensors.
-
Where is my model defined?
A useful feature of SambaFlow is that an optimizer can be included in the definition and
forward
section of a model. An optimizer can be mapped directly onto an RDU, greatly enhancing performance. See Define the model. -
Where is my model instantiated?
The model must be explicitly converted to SambaFlow. Fortunately, only a single SambaFlow method needs to be used to do that. See Tie it all together with main().
-
Where is my loss function defined and what is it?
A loss function can be a part of a model’s definition. So, if your model uses a PyTorch loss function that SambaFlow supports, the function can be moved, as in Define the model. If your model doesn’t use a supported loss function it can be used externally. See Examine model code with external loss function.
-
Where is my optimizer defined and what is it?
Unlike loss functions, optimizers can’t be added directly to a model’s definition in SambaFlow. Loss functions are passed into SambaFlow during compilation and training. See Tie it all together with main().
Compile and run the model
To compile a model, you always use the following syntax:
$ python <model>.py compile --pef-name <pef_name>
Assuming you’ve saved the example code as cnn_conversion.py
, run the following command.
$ python cnn_conversion.py compile --pef-name cnn_conversion.pef
To run the model, you pass in the PEF file that was generated during compilation. The syntax is:
$ python <model>.py run --pef <pef_name>
For this example, run the following command:
$ python cnn_conversion.py run --pef cnn_conversion.pef
See Compile and run your first model for details
Model conversion tips and tricks
This section offers some tips and tricks for model conversion.
-
Torch Dataloaders. If the last batch’s length is not exactly equal to your batch size, for example, if the size of the last batch is 28 and your PEF batch size is 32, compilation fails with a PEF mismatch error. Set the parameter
drop_last_one=True
to avoid that problem. -
Data Visualization. SambaNova recommends that you don’t do data visualization directly on a SambaNova system.
Learn more!
-
To understand what the messages to stdout mean, see SambaNova messages and logs.
-
To learn how to run models inside a Python virtual environment, see Use Python virtual environments.
-
For information about supported PyTorch operators, see the SambaFlow API Reference .