Convert a simple model to SambaFlow

Many SambaNova customers convert an existing model that they built in PyTorch to SambaFlow. This doc page uses a simple example to illustrate what is essential for the conversion and discusses some best practices. You’ll see that much of your code remains unchanged and that SambaFlow doesn’t usually require you to reformat your data.

This tutorial is about model conversion. For background on data preparation, see our public GitHub repository External link.

In this tutorial, you:

The example model

Convolutional Neural Networks (CNNs) are a popular model type in the Visual AI space. Our example model is a CNN that performs image classification on the MNIST dataset. It consists of four layers:

  • 2 Convolutional layers, each containing a:

    • Conv2D

    • ReLU

    • MaxPool2D

  • 2 Fully-connected linear layers

Included or external loss function

This conversion example presents two example solutions:

  • The solution in Model functions and changes includes the model’s loss function as part of the model definition.

    • This approach results in performance enhancements because loss computation happens on RDU.

    • In the example, the loss function is included in the forward() function.

  • The solution in Model with an external loss function, includes code for a loss function is external to the model.

    • This solution uses a host CPU to compute the loss and gradients for backpropagation.

    • Use this approach if your model’s loss function isn’t currently supported by SambaFlow or if you are using a custom loss function.

Original and converted model code

This tutorial explains code modifications using a simple 2-layer Convolutional Neural Network example. We picked this example because it’s simple and compiles quickly.

  • You can download the original code from this repo: https://github.com/adventuresinML/adventures-in-ml-code/blob/master/conv_net_py_torch.py.

  • The revised code is available below.

    Included loss function
    import sambaflow
    import sambaflow.samba as samba
    import sambaflow.samba.optim as optim
    import sambaflow.samba.utils as utils
    from sambaflow.samba.utils.common import common_app_driver
    from sambaflow.samba.utils.argparser import parse_app_args
    from sambaflow.samba.sambaloader import SambaLoader
    
    import sys
    import argparse
    from typing import Tuple
    
    import numpy as np
    import torch
    import torch.nn as nn
    from torch.utils.data import DataLoader
    from torchvision import datasets, transforms
    
    class ConvNet(nn.Module):
        """
        Instantiate a 4-layer CNN for MNIST Image Classification.
    
        In SambaFlow, it is possible to include a loss function as part of a model's definition and put it in
        the forward method to be computed.
    
        Typical SambaFlow usage example:
    
        model = ConvNet()
        samba.from_torch_model_(model)
        optimizer = ...
        inputs = ...
        if args.command == "run":
            utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
            train(args, model)
        """
    
        def __init__(self):
    
            super(ConvNet, self).__init__()
            self.layer1 = nn.Sequential(
                nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2),
            )
            self.layer2 = nn.Sequential(
                nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2),
            )
            self.drop_out = nn.Dropout()
            self.fc1 = nn.Linear(7 * 7 * 64, 1000)
            self.fc2 = nn.Linear(1000, 10)
            self.criterion = nn.CrossEntropyLoss() # Add loss function to model
    
        def forward(self, x: torch.Tensor, labels: torch.Tensor):
            out = self.layer1(x)
            out = self.layer2(out)
            out = out.reshape(out.size(0), -1)
            out = self.drop_out(out)
            out = self.fc1(out)
            out = self.fc2(out)
            loss = self.criterion(out, labels)     # Compute loss
            return loss, out
    
    def add_user_args(parser: argparse.ArgumentParser) -> None:
        """
        Add user-defined arguments.
    
        Args:
            parser (argparse.ArgumentParser): SambaFlow argument parser
        """
    
        parser.add_argument(
            "-bs",
            type=int,
            default=100,
            metavar="N",
            help="input batch size for training (default: 100)",
        )
        parser.add_argument(
            "--num-epochs",
            type=int,
            default=6,
            metavar="N",
            help="number of epochs to train (default: 6)",
        )
        parser.add_argument(
            "--num-classes",
            type=int,
            default=10,
            metavar="N",
            help="number of classes in dataset (default: 10)",
        )
        parser.add_argument(
            "--learning-rate",
            type=float,
            default=0.001,
            metavar="LR",
            help="learning rate (default: 0.001)",
        )
        parser.add_argument(
            "--data-path",
            type=str,
            default="data",
            help="Download location for MNIST data",
        )
        parser.add_argument(
            "--model-path", type=str, default="model", help="Save location for model"
        )
    
    def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]:
        """
        Generates random SambaTensors in the same shape as MNIST image  and label tensors.
    
        In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that
        is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU.
    
        Args:
            args (argparse.Namespace): User- and system-defined command line arguments
    
        Returns:
            A tuple of SambaTensors with random values in the same shape as MNIST image and label tensors.
        """
    
        dummy_image = (
            samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0),
            samba.randint(args.num_classes, (args.bs,), name="label", batch_dim=0),
        )
    
        return dummy_image
    
    def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, sambaflow.samba.sambaloader.SambaLoader]:
        """
        Transforms MNIST input to tensors and creates training/test dataloaders.
    
        Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
        data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
        SambaLoaders.
    
        Args:
            args (argparse.Namespace): User- and system-defined command line arguments
    
        Returns:
            A tuple of SambaLoaders over the training and test sets.
        """
    
        # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
        transform = transforms.Compose(
            [
                transforms.ToTensor(),
                transforms.Normalize((0.1307,), (0.3081,)),
            ]
        )
    
        # Get the train & test data (images and labels) from the MNIST dataset
        train_dataset = datasets.MNIST(
            root=args.data_path,
            train=True,
            transform=transform,
            download=True,
        )
        test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)
    
        # Set up the train & test data loaders (input pipeline)
        train_loader = DataLoader(
            dataset=train_dataset, batch_size=args.bs, shuffle=True
        )
        test_loader = DataLoader(
            dataset=test_dataset, batch_size=args.bs, shuffle=False
        )
    
        # Create SambaLoaders
        sn_train_loader = SambaLoader(train_loader, ["image", "label"])
        sn_test_loader = SambaLoader(test_loader, ["image", "label"])
    
        return sn_train_loader, sn_test_loader
    
    def train(args: argparse.Namespace, model: nn.Module) -> None:
        """
        Trains the model.
    
        Prepares and loads the data, then runs the training loop with the hyperparameters specified
        by the input arguments.  Calculates loss and accuracy over the course of training.
    
        Args:
            args (argparse.Namespace): User- and system-defined command line arguments
            model (nn.Module): ConvNet model
        """
    
        sn_train_loader, _ = prepare_dataloader(args)
        hyperparam_dict = {"lr": args.learning_rate}
    
        total_step = len(sn_train_loader)
        loss_list = []
        acc_list = []
    
        for epoch in range(args.num_epochs):
            for i, (images, labels) in enumerate(sn_train_loader):
    
                # Run the model on RDU: forward -> loss/gradients -> backward/optimizer
                loss, outputs = samba.session.run(
                    input_tensors=(images, labels),
                    output_tensors=model.output_tensors,
                    hyperparam_dict=hyperparam_dict
                )
    
                # Convert SambaTensors back to Torch Tensors to calculate accuracy
                loss, outputs = samba.to_torch(loss), samba.to_torch(outputs)
                loss_list.append(loss.tolist())
    
                # Track the accuracy
                total = labels.size(0)
                _, predicted = torch.max(outputs.data, 1)
                correct = (predicted == labels).sum().item()
                acc_list.append(correct / total)
    
                if (i + 1) % 100 == 0:
                    print(
                        "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                            epoch + 1,
                            args.num_epochs,
                            i + 1,
                            total_step,
                            torch.mean(loss),
                            (correct / total) * 100,
                        )
                    )
    
    def main(argv):
    
        args = parse_app_args(argv=argv, common_parser_fn=add_user_args)
    
        # Create the CNN model
        model = ConvNet()
    
        # Convert model to SambaFlow (SambaTensors)
        samba.from_torch_model_(model)
    
        # Create optimizer
        # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
        optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate)
    
        # Normally, we'd define a loss function here, but with SambaFlow, it can be defined
        # as part of the model, which we have done in this case
    
        # Create dummy SambaTensor for graph tracing
        inputs = get_inputs(args)
    
        # The common_app_driver() handles model compilation and various other tasks, e.g.,
        # measure-performance.  Running, or training, a model must be explicitly carried out
        if args.command == "run":
            utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
            train(args, model)
        else:
            common_app_driver(args=args,
                            model=model,
                            inputs=inputs,
                            optim=optimizer,
                            name=model.__class__.__name__,
                            init_output_grads=not args.inference,
                            app_dir=utils.get_file_dir(__file__))
    
    if __name__ == '__main__':
        main(sys.argv[1:])
    Custom loss function
    import sambaflow
    import sambaflow.samba as samba
    import sambaflow.samba.optim as optim
    import sambaflow.samba.utils as utils
    from sambaflow.samba.utils.common import common_app_driver
    from sambaflow.samba.utils.argparser import parse_app_args
    from sambaflow.samba.sambaloader import SambaLoader
    
    import sys
    import argparse
    from typing import (Tuple, Callable)
    
    import numpy as np
    import torch
    import torch.nn as nn
    from torch.utils.data import DataLoader
    from torchvision import datasets, transforms
    
    class ConvNetCustomLoss(nn.Module):
        """
        Instantiate a 4-layer CNN for MNIST Image Classification.
    
        In SambaFlow, while it is possible to include a loss function in the model definition, it
        is not done here as an example of how to compute loss on the host.
    
        Typical SambaFlow usage example:
    
        model = ConvNet()
        samba.from_torch_(model)
        optimizer = ...
        inputs = ...
        if args.command == "run":
            utils.trace_graph(model, inputs, optimizer, pef=args.pef, mapping=args.mapping)
            train(args, model)
        """
    
        def __init__(self):
    
            super(ConvNetCustomLoss, self).__init__()
            self.layer1 = nn.Sequential(
                nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2),
            )
            self.layer2 = nn.Sequential(
                nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2),
            )
            self.drop_out = nn.Dropout()
            self.fc1 = nn.Linear(7 * 7 * 64, 1000)
            self.fc2 = nn.Linear(1000, 10)
    
        def forward(self, x: torch.Tensor):
            # Since loss isn't part of the model, we don't pass a label to forward()
            out = self.layer1(x)
            out = self.layer2(out)
            out = out.reshape(out.size(0), -1)
            out = self.drop_out(out)
            out = self.fc1(out)
            out = self.fc2(out)
            return out
    
    def add_user_args(parser: argparse.ArgumentParser) -> None:
        """
        Add user-defined arguments.
    
        Args:
            parser (argparse.ArgumentParser): SambaFlow argument parser
        """
    
        parser.add_argument(
            "-bs",
            type=int,
            default=100,
            metavar="N",
            help="input batch size for training (default: 100)",
        )
        parser.add_argument(
            "--num-epochs",
            type=int,
            default=6,
            metavar="N",
            help="number of epochs to train (default: 6)",
        )
        parser.add_argument(
            "--num-classes",
            type=int,
            default=10,
            metavar="N",
            help="number of classes in dataset (default: 10)",
        )
        parser.add_argument(
            "--learning-rate",
            type=float,
            default=0.001,
            metavar="LR",
            help="learning rate (default: 0.001)",
        )
        parser.add_argument(
            "--data-path",
            type=str,
            default="data",
            help="Download location for MNIST data",
        )
        parser.add_argument(
            "--model-path", type=str, default="model", help="Save location for model"
        )
    
    def get_inputs(args: argparse.Namespace) -> Tuple[samba.SambaTensor]:
        """
        Generates random SambaTensors in the same shape as MNIST image tensors.
    
        In order to properly compile a PEF and trace the model graph, SambaFlow requires a SambaTensor that
        is the same shape as the input Torch Tensors, allowing the graph to be optimally mapped onto an RDU.
    
        Args:
            args (argparse.Namespace): User- and system-defined command line arguments
    
        Returns:
            A SambaTensor with random values in the same shape as MNIST image tensors.
        """
    
        # Loss is computed on the host, so a dummy SambaTensor is only needed for the MNIST images
        return samba.randn(args.bs, 1, 28, 28, name="image", batch_dim=0),
    
    def prepare_dataloader(args: argparse.Namespace) -> Tuple[sambaflow.samba.sambaloader.SambaLoader, ...]:
        """
        Transforms MNIST input to tensors and creates training/test dataloaders.
    
        Downloads the MNIST dataset (if necessary); splits the data into training and test sets; transforms the
        data to tensors; then creates Torch DataLoaders over those sets.  Torch DataLoaders are wrapped in
        SambaLoaders.
    
        Input:
            args: User- and system-defined command line arguments
    
        Returns:
            A tuple of SambaLoaders over the training and test sets.
        """
    
        # Transform the raw MNIST data into PyTorch Tensors, which will be converted to SambaTensors
        transform = transforms.Compose(
            [
                transforms.ToTensor(),
                transforms.Normalize((0.1307,), (0.3081,)),
            ]
        )
    
        # Get the train & test data (images and labels) from the MNIST dataset
        train_dataset = datasets.MNIST(
            root=args.data_path,
            train=True,
            transform=transform,
            download=True,
        )
        test_dataset = datasets.MNIST(root=args.data_path, train=False, transform=transform)
    
        # Set up the train & test data loaders (input pipeline)
        train_loader = DataLoader(
            dataset=train_dataset, batch_size=args.bs, shuffle=True
        )
        test_loader = DataLoader(
            dataset=test_dataset, batch_size=args.bs, shuffle=False
        )
    
        # Create SambaLoaders
        # function_hook allows us to specify which tensor(s) should be passed along to the model
        #  -> The hook must return a list containing the same number of tensors as specified in the list of names
        #  -> Any other tensors will be filtered out, so if you need those, then...
        # return_original_batch allows us to retain the original input tensors for later processing, e.g., computing loss
        #  -> It causes the SambaLoader to also return a list of the original input tensors
        sn_train_loader = SambaLoader(dataloader=train_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)
        sn_test_loader = SambaLoader(dataloader=test_loader, names=["image"], function_hook=lambda t: [t[0]], return_original_batch=True)
    
        return sn_train_loader, sn_test_loader
    
    def train(args: argparse.Namespace, model: nn.Module, criterion: Callable) -> None:
        """
        Trains the model.
    
        Prepares and loads the data, then runs the training loop with the hyperparameters specified
        by the input arguments with a given loss function.  Calculates loss and accuracy over the course of training.
    
        Args:
            args (argparse.Namespace): User- and system-defined command line arguments
            model (nn.Module): ConvNet model
            criterion (Callable): Loss function
        """
    
        sn_train_loader, sn_test_loader = prepare_dataloader(args)
        hyperparam_dict = {"lr": args.learning_rate}
    
        total_step = len(sn_train_loader)
        loss_list = []
        acc_list = []
    
        for epoch in range(args.num_epochs):
            for i, (images, original_batch) in enumerate(sn_train_loader):
    
                # The label tensor is the second element of the original batch
                labels = original_batch[1]
    
                # Run only the forward pass on RDU and note the section_types argument
                # The first element of the returned tuple contains the raw outputs of forward()
                outputs = samba.session.run(
                    input_tensors=(images,),
                    output_tensors=model.output_tensors,
                    hyperparam_dict=hyperparam_dict,
                    section_types=["FWD"]
                )[0]
    
                # Convert SambaTensors back to Torch Tensors to carry out loss calculation
                # on the host CPU.  Be sure to set the requires_grad attribute for PyTorch.
                outputs = samba.to_torch(outputs)
                outputs.requires_grad = True
    
                # Compute loss on host CPU and store it for later tracking
                loss = criterion(outputs, labels)
    
                # Compute gradients on CPU
                loss.backward()
                loss_list.append(loss.tolist())
    
                # Run the backward pass and optimizer step on RDU and note the grad_of_outputs
                # and section_types arguments
                samba.session.run(
                    input_tensors=(images,),
                    output_tensors=model.output_tensors,
    		        hyperparam_dict=hyperparam_dict,
    		        grad_of_outputs=[samba.from_torch_tensor(outputs.grad)], # Bring the grads back from CPU to RDU
                    section_types=["BCKWD", "OPT"])
    
                # Compute and track the accuracy
                total = labels.size(0)
                _, predicted = torch.max(outputs.data, 1)
                correct = (predicted == labels).sum().item()
                acc_list.append(correct / total)
    
                if (i + 1) % 100 == 0:
                    print(
                        "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%".format(
                            epoch + 1,
                            args.num_epochs,
                            i + 1,
                            total_step,
                            torch.mean(loss),
                            (correct / total) * 100,
                        )
                    )
    
    def main(argv):
    
        args = parse_app_args(argv=argv, common_parser_fn=add_user_args)
    
        # Create the CNN model
        model = ConvNetCustomLoss()
    
        # Convert model to SambaFlow (SambaTensors)
        samba.from_torch_model_(model)
    
        # Create optimizer
        # Note that SambaFlow currently supports AdamW, not Adam, as an optimizer
        optimizer = samba.optim.AdamW(model.parameters(), lr=args.learning_rate)
    
        ###################################################################
        # Define loss function here to be used in the forward pass on CPU #
        ###################################################################
        criterion = nn.CrossEntropyLoss()
    
        # Create dummy SambaTensor for graph tracing
        inputs = get_inputs(args)
    
        # The common_app_driver() handles model compilation and various other tasks, e.g.,
        # measure-performance.  Running, or training, a model must be explicitly carried out
        if args.command == "run":
            utils.trace_graph(model, inputs, optimizer, init_output_grads=not args.inference, pef=args.pef, mapping=args.mapping)
            train(args, model, criterion)
        else:
            common_app_driver(args=args,
                            model=model,
                            inputs=inputs,
                            optim=optimizer,
                            name=model.__class__.__name__,
                            init_output_grads=not args.inference,
                            app_dir=utils.get_file_dir(__file__))
    
    if __name__ == '__main__':
        main(sys.argv[1:])

Planning questions

To make the conversion process more straightforward, consider these planning questions.

  1. Where are my dataloaders?

    All models need data and one of the easiest ways to feed in that data is with a PyTorch DataLoader. The output tensors that come from the DataLoader need to be converted into SambaTensors. See Prepare dataloader.

  2. What shape are my input tensors?

    When you compile a SambaFlow model, the compute graph of your model is physically mapped onto an RDU. To perform this mapping, SambaFlow needs to know the shape of the input tensors. See Generate tensors.

  3. Where is my model defined?

    A useful feature of SambaFlow is that a loss function can be included in the definition and forward section of a model. A loss function can be mapped directly onto an RDU, greatly enhancing performance. See Define the model.

  4. Where is my model instantiated?

    The model must be explicitly converted to SambaFlow. Fortunately, only a single SambaFlow method needs to be used to do that. See Tie it all together with main().

  5. Where is my loss function defined and what is it?

    A loss function can be a part of a model’s definition. So, if your model uses a PyTorch loss function that SambaFlow supports, the function can be moved, as in Define the model. If your model doesn’t use a supported loss function it can be used externally. See Model with an external loss function.

  6. Where is my optimizer defined and what is it?

    Unlike loss functions, optimizers can’t be added directly to a model’s definition in SambaFlow. Loss functions are passed into SambaFlow during compilation and training. See Tie it all together with main().

Compile and run the model

To compile a model, you always use the following syntax:

$ python <model.py> compile --pef-name <pef_name>

Assuming you’ve saved the example code as cnn_conversion.py, run the following command.

$ python cnn_conversion.py compile --pef-name cnn_conversion.pef

To run, the model, you pass in the PEF file that was generated during compilation. The syntax is:

$ python <model.py> run --pef </path/to/pef_name>

For this example, run the following command:

$ python cnn_conversion.py run --pef cnn_conversion.pef

Model conversion tips and tricks

This section, to be expanded, offers some tips and tricks for model conversion.

  • Torch Dataloaders. If the last batch’s length is not exactly divisible by your batch size, for example, if the size of the last batch is 28 and your PEF batch size is 32, compilation fails with a PEF mismatch error. Set the parameter drop_last_one=True to avoid that problem.

  • Data Visualization. SambaNova recommends that you don’t do data visualization directly on a SambaNova system.

Learn more!