Configure SambaNova Runtime components

Administrators with superuser privileges can stop and restart the SambaNova Daemon (SND) or change runtime log levels. Administrators might perform other tasks, such as resetting RDUs, as described in Runtime Troubleshooting.

Manage SND

The SambaNova Daemon (SND) starts automatically when the host boots. When SND starts, it initializes the hardware resources and loads the driver. You can then run applications on the hardware.

All users can see SND status, as follows:

Command Usage Effect

status

$ systemctl status snd

Check the SND service

Administrators can use these systemctl(1) commands to manage SND:

Command Usage Effect

start

$ sudo systemctl start snd

Start the SND service

stop

$ sudo systemctl stop snd

Stop the SND service

disable

$ sudo systemctl disable snd

Disable SND service and de-register SND auto-start on boot

enable

$ sudo systemctl enable snd

Enable SND service and register SND auto-start on boot

edit

$ sudo systemctl edit snd

Edit environment configurations. Requires a service restart to take effect

Change Runtime Log Levels

The Runtime package supports several logs. This section discusses how to change log levels.

Change application log levels

Application logs are sent to /var/log/sambaflow/runtime/sn.log. ERR level application messages are also printed to the console.

We support the following application log levels:

  • NOTICE

  • WARNING

  • ERR

  • CRIT

  • ALERT

  • EMERG

To change the log level, set the SF_RNT_LOG_LEVEL environment variable before calling your SambaFlow or SambaRuntime script or binary. For example:

SF_RNT_LOG_LEVEL=INFO python3 my_app.py
SF_RNT_LOG_LEVEL=INFO ./my_app.bin
If the application is run with sudo, then the levels can only be changed with the environment variable prefixing the command, as shown in the example above. Exported environment variables are not propagated to applications that are run with sudo.

Change SND log levels

You need superuser privileges to view SND logs or make changes to them.

SND logs are sent /var/log/sambaflow/runtime/snd.log. To set snd log levels, define SF_RNT_LOG_LEVEL in the override.conf file. The override.conf` file persists even after reinstalling SND or rebooting the system, so you need to do this step only when you want to change the SND log level.

  1. Open the file for edit.

    systemctl edit snd
  2. Add an environment variable, as follows:

    [Service]
    Environment="SF_RNT_LOG_LEVEL=<log_level>"

    Replace <log_level> with your desired log level. You can use the same levels as for application logs.

  3. Restart SND

    systemctl restart snd

Change kernel log levels

Kernel logs appear in both dmesg and /var/log/kern.log. Administrators can change the kernel log level.

To get the default log level:

modprobe rdu && modprobe rdu_mem_map && modprobe rdu_peer_mem_client && sudo systemctl start snd

To change the log level:

modprobe rdu rdu_log_level=my_level && modprobe rdu_mem_map && modprobe rdu_peer_mem_client && sudo systemctl start snd

For my_level, specify the integer associated with the log level that you want:

  • 127 — INFO (default)

  • 63 — WARNING

  • 31 — ERR

Use GraphCLI to examine graphs

GraphCLI allows users to stop/break at a specified graph state and get information from a running graph on the RDU. The list of debug commands below should explain what each command does

  1. Run an application on a DataScale host.

    • Ensure that the SNCLI_SERVER environment variable is set when the app is launched

      $ SNCLI_SERVER="localhost:50052" python /path/to/app <args>
  2. Launch the GraphCLI client in another window:

    $ /opt/sambaflow/diag/run_sncli.py
  3. Connect the GraphCLI client to a graph. The graph is generated by the compiler when you compile the model.

    $ connect graph localhost:50052
  4. Get graph information or disconnect, for example:

    $ show
    $ dump symbol <symbol name> vsnode <ID>
    $ disconnect

Graph CLI debug commands

Command Description

connect

Connect to a remote target

disconnect

Disconnect from a remote target

exit

Exit from Graph CLI

dump symbol <symbol name> vsnode <ID>

Dump graph symbol contents on a particular vsnode

show

Show graph information, including symbols, arguments, host functions, resources, profile

break section <section_id> state <state_id>

Add a breakpoint at the specified section:state

delete section <section_id> state <state_id>

Delete a breakpoint at the specified section:state

breakpoints

List all breakpoints

next

Go to the next graph FSM state

continue

Continue graph FSM execution

list_step

List all valid section:state combinations in the current run

?

Context-sensitive help

enter / space

Auto-completion

up / down

Move through commands in history

CTRL-C

Delete and abort the current command