SambaNova Runtime architecture
SambaNova Runtime is an AI-specific OS tailored for the development and operation of the SambaNova Reconfigurable Dataflow Architecture (RDA). The different components of Runtime support hardware management and access, resource allocation, and more.
-
SambaNova developers can use Runtime to drill down during troubleshooting. They can also use the SNML API to see Runtime status.
-
SambaNova administrators can manage several components, most noticably the SambaNova Daemon, explicitly. They can also use the
snconfig
tool or the SNML API to interact with Runtime.
Architecture overview
SambaNova Runtime consists of a set of components that together manage tasks that an operating system usually performs.
The main components are the kernel, the SambaNova Daemon, and the application stack.
-
The SambaNova Daemon (SND) is a user-level program which handles system initialization and fault management. SND includes the SambaNova Management Layer (SMNL) which allows you to interact with Runtime securely using the
snconfig
CLI or the SMNL API. -
The kernel offers low-level APIs that extend the host’s operating system with access to the RDU and perform privileged tasks, like managing multi-tenancy.
-
The application stack is a system-level library of APIs that allow users to run ML models. Part of the application stack is the Collective Communication Library (CCL) which orchestrates data-parallel scale-out.
All components are in the sambaflow
package.
Runtime component overview
Administrators and developers have access to a set of tools and logs after successful installation of Runtime. This set of component, a subset of the full stack, allows you to influence how Runtime works.
Some tools and logs are for administrators, while others help developers find causes for problems during model runs. |
Component | Description | See |
---|---|---|
SambaNova daemon (SND) |
The SambaNova daemon (SND) is running on the DataScale host module and manages several critical pieces of the SambaNova operation. |
|
snconfig tool |
The SambaNova Configuration tool (snconfig) displays, queries, configures and manages system resources on a DataScale system. Developers can use the SNML API to perform the same tasks. |
Run |
sntilestat tool |
Displays the status and utilization of each tile within each Reconfigurable Dataflow Unit (RDU). |
Run |
SambaNova Fault Management (SNFADM) tool |
The SambaNova Fault Management (SNFM) framework supports reporting, diagnosing, and analyzing the system error and fault events associated with a DataScale system. |
|
SambaNova Management Layer (SNML) API |
SambaNova Management Layer (SNML) contains APIs that you can use to programmatically:
|
|
SambaNova Slurm plugin |
The Slurm plugin supports using Slurm to manage SambaNova hardware resources. |
|
SambaNova logs |
Several logs are available. You can configure log levels. |
Architecture details
The following information is excerpted in part from the paper RDARuntime: An OS for AI Accelerators that SambaNova staff submitted to ROSS, a workshop that was held in conjunction with SC23. The full paper is available through the ACM Digital library here.
Multi-tenancy
When several users request access to hardware resources at the same time, Runtime mediates the requests.
The Kernel Resource Manager (KRM) inside the kernel keeps track of hardware resources and validates resource requests. The kernel component presents all RDU resources and services through a single device file.
In most cases:
-
An application opens the Runtime application library and requests RDU hardware.
-
The userspace library forwards those requests to the kernel, which attempts to allocate and schedule the job.
-
If allocation is unsuccessful, the kernel returns an error.
-
If allocation succeeds, Runtime returns a success code and assigns the requested hardware resources to the application.
-
The kernel maps only the resources that it has allocated to the userspace process.
-
An application process has access only to resources that the KRM has allocated to it.
-
A userspace library has direct access to the hardware, resulting in low latency.
Data Parallel applications
Data parallel means that
-
You run several replicas of a model.
-
Each replica independently runs forward and backward on different shards of the data.
-
The replicas synchronize their gradients.
-
The optimizer uses the synchronized gradients to output a set of weights.
-
The process is repeated.
As part of this process, communication is limited to gradient synchonization, otherwise, the replicas run independently.