SambaNova Runtime architecture

SambaNova Runtime is an AI-specific OS tailored for the development and operation of the SambaNova Reconfigurable Dataflow Architecture (RDA). The different components of Runtime support hardware management and access, resource allocation, and more.

  • SambaNova developers can use Runtime to drill down during troubleshooting. They can also use the SNML API to see Runtime status.

  • SambaNova administrators can manage several components, most noticably the SambaNova Daemon, explicitly. They can also use the snconfig tool or the SNML API to interact with Runtime.

Architecture overview

SambaNova Runtime consists of a set of components that together manage tasks that an operating system usually performs.

image that shows all runtime components

The main components are the kernel, the SambaNova Daemon, and the application stack.

  • The SambaNova Daemon (SND) is a user-level program which handles system initialization and fault management. SND includes the SambaNova Management Layer (SMNL) which allows you to interact with Runtime securely using the snconfig CLI or the SMNL API.

  • The kernel offers low-level APIs that extend the host’s operating system with access to the RDU and perform privileged tasks, like managing multi-tenancy.

  • The application stack is a system-level library of APIs that allow users to run ML models. Part of the application stack is the Collective Communication Library (CCL) which orchestrates data-parallel scale-out.

All components are in the sambaflow package.

Runtime component overview

Administrators and developers have access to a set of tools and logs after successful installation of Runtime. This set of component, a subset of the full stack, allows you to influence how Runtime works.

Some tools and logs are for administrators, while others help developers find causes for problems during model runs.
Table 1. Runtime components
Component Description See

SambaNova daemon (SND)

The SambaNova daemon (SND) is running on the DataScale host module and manages several critical pieces of the SambaNova operation.

snconfig tool

The SambaNova Configuration tool (snconfig) displays, queries, configures and manages system resources on a DataScale system. Developers can use the SNML API to perform the same tasks.

Run snconfig --help for details.

sntilestat tool

Displays the status and utilization of each tile within each Reconfigurable Dataflow Unit (RDU).

Run man sntilestat for details and examples.

SambaNova Fault Management (SNFADM) tool

The SambaNova Fault Management (SNFM) framework supports reporting, diagnosing, and analyzing the system error and fault events associated with a DataScale system.

SambaNova fault management (SNFM)

SambaNova Management Layer (SNML) API

SambaNova Management Layer (SNML) contains APIs that you can use to programmatically:

  • Request information about RDU status.

  • Manage RDUs.

  • Retrieve information about the host.

  • Perform other DataScale system tasks. Use the snconfig tool to interactively perform these tasks.

SambaNova fault management (SNFM)

SambaNova Slurm plugin

The Slurm plugin supports using Slurm to manage SambaNova hardware resources.

SambaNova logs

Several logs are available. You can configure log levels.

Change runtime log levels

Architecture details

The following information is excerpted in part from the paper RDARuntime: An OS for AI Accelerators that SambaNova staff submitted to ROSS, a workshop that was held in conjunction with SC23. The full paper is available through the ACM Digital library here.

Multi-tenancy

When several users request access to hardware resources at the same time, Runtime mediates the requests.

image that shows how the stack mediates multiple users.

The Kernel Resource Manager (KRM) inside the kernel keeps track of hardware resources and validates resource requests. The kernel component presents all RDU resources and services through a single device file.

In most cases:

  1. An application opens the Runtime application library and requests RDU hardware.

  2. The userspace library forwards those requests to the kernel, which attempts to allocate and schedule the job.

    • If allocation is unsuccessful, the kernel returns an error.

    • If allocation succeeds, Runtime returns a success code and assigns the requested hardware resources to the application.

The kernel maps only the resources that it has allocated to the userspace process.

  • An application process has access only to resources that the KRM has allocated to it.

  • A userspace library has direct access to the hardware, resulting in low latency.

Data Parallel applications

Data parallel means that

  1. You run several replicas of a model.

  2. Each replica independently runs forward and backward on different shards of the data.

  3. The replicas synchronize their gradients.

  4. The optimizer uses the synchronized gradients to output a set of weights.

  5. The process is repeated.

As part of this process, communication is limited to gradient synchonization, otherwise, the replicas run independently.

image shows the flow above for replica 0 on left