Foundation Services Rack administration

1. Get started with DataScale Foundation Services Rack administration

This SambaNova DataScale® Foundation Services Rack administration document targets Foundation Services Rack v1.

This page gets you started:

  • Learn about SambaNova support, SambaNova documentation, and other resources.

  • Get an overview of the Foundation Services Rack components.

See the Foundation Services Rack hardware installation documentation for details on hardware installation requirements and tasks.

1.1. SambaNova support

SambaNova customers that have valid support contracts can contact support and obtain product support documentation through the SambaNova support portal at https://support.sambanova.ai.

1.2. SambaNova documentation

As part of hardware installation, you might need SambaNova documentation, SambaNova KBs, and third-party documentation.

1.3. Third-party documentation

For operational issues with the third-party components in the Foundation Services Rack, see the following vendor-specific product documentation. If you need additional support or have troubleshooting questions related to troubleshooting, open a support case through SambaNova Support. See KB article #1017, "SambaNova Systems Support Best Practices," at https://support.sambanova.ai.

Do not open a support case with the product vendor. Direct all support requests to SambaNova Support first.

1.4. Overview of Foundation Services Rack hardware

The Foundation Services Rack is a self-contained standard 42 rack unit (RU) datacenter rack. The Foundation Services Rack provides a centralized networking location for intra-connectivity of multiple SambaNova DataScale® racks and uplink connectivity to the customer’s network. The centralized networking that the Foundation Services Rack provides ensures a known level of performance for the high-speed data network and reduces the number of connections that the customer needs for the management and access networks. Network switches are installed at the top of the rack.

Foundation Services Rack components (front view)

Foundation Services Rack components (front view)

Table 1. Foundation Services Rack components
No. Component

1

Lantronix® serial console server

2

Juniper® QFX5130 Ethernet (fan side)

3

Juniper EX4300 behind (fan side)

2. Network administration

This page has information about network administration for the Foundation Services Rack.

  • Pointers to third-party documents for the network devices.

  • Instructions for changing passwords for the network devices.

Most users do not significantly change the configuration of the serial console server or the Juniper switches. These devices come pre-configured for each customer based on the information supplied in the site survey that was completed prior to shipment. This topic discusses only tasks that you’re likely to perform and includes example IP addresses.

For more information:

2.1. Default username and passwords

The following table shows several components in the Foundation Services Rack that have default passwords for users with administrative/root credentials.

SambaNova highly recommends that you change these default passwords as soon as possible. See Section 2.2.
Table 2. Default usernames and passwords
Component Username Default password

Lantronix serial console server

sysadmin

Changeme

Juniper QFX5130 high-bandwidth Ethernet data switch

root

Changeme

Juniper EX4300 access switch

root

Changeme

VertivTM PDU

admin

Changeme

SambaNova highly recommends that you change the default passwords at first login.

2.2. How to change passwords for switches

This section gives instruction for changing passwords on switches. See Section 2.1

2.2.1. Juniper EX4300 access switch and QFX5130 data switch

Change password for the Juniper EX4300 access switch and QFX5130 data switch:

  1. Run the following commands:

    $ ssh root@<Juniper_switch_IP_address>
    root@:RE:0% cli
    root> configure
    root# set system root-authentication plain-text-password
    New password: <type_password>
    Retype new password: <type_password>
    root# commit
  2. Log out of the switch by using the exit command 3 times (exit config mode, exit operational mode, exit the Linux CLI)

  3. Log back in with the new password.

2.2.2. Lantronix SLC8000 serial console server

Change password for the Lantronix SLC8000 serial console server:

Run the following command:

$ ssh sysadmin@<Lantronix_switch_IP_address>
> set localusers password sysadmin

2.3. Patch releases for network devices

SambaNova provides a periodic patch release for these network devices. Customers can download these patches from the SambaNova ext-infra-patch repository. See KB article #1062 "Listing and downloading available SambaNova firmware" for details.

Patch release notes explain any steps that differ from the standard steps described in the product administration documentation.

SambaNova does not qualify all firmware releases from device manufacturers. Apply only SambaNova approved firmware updates.

3. Foundation Services Rack power management

For proper operation of the Foundation Services Rack and to prevent issues, be sure you power on and power off the system appropriately and in the correct sequence, as described on this page.

3.1. Warnings and general notes

The following notices apply to the Foundation Services Rack.

Some components within the rack work at high voltage. To prevent personal injury and voiding of the warranty, do not attempt to service components except where noted.
To protect the Foundation Services Rack from interference and to prevent damage to its components, keep the front and rear rack doors closed during standard operation.
To prevent Foundation Services Rack components from overheating, keep the front and rear of the rack clear of obstructions to allow proper airflow.
Before powering on the Foundation Services Rack, read Foundation Services Rack hardware installation documentation (at https://docs.sambanova.ai) to ensure that you understand any known issues or limitations.
Do not power off or reboot the Foundation Services Rack components during any firmware update procedure. Doing so might damage the Foundation Services Rack components, and damaged components might not be recoverable. Perform a shutdown or reboot only after a firmware update has been completed.
When the PDUs are physically connected to the datacenter’s power receptacles and power is applied to the rack, all Foundation Services Rack components begin to power on. The fans of these components initially run at full speed but eventually ramp down after the BMCs finish their boot sequence. Power is not immediately applied to the rack components because the breakers on the PDUs are turned off. You must manually turn on these breakers to begin feeding power to the Foundation Services Rack components.
Some devices in the Foundation Services Rack take longer than others to complete their power on boot sequence. Those devices may prevent access to other devices that boot faster until all devices have fully booted.

3.2. Power on the Foundation Services Rack

  1. To power on the rack, turn on the six circuit breakers for each PDU.

    When the PDUs are plugged into the datacenter power and you close the circuit breakers, power is automatically applied to the Foundation Services Rack components.

Figure 1 shows what a PDU circuit breaker group looks like and shows breaker switch 6 circled. Each PDU has a bank of three circuit breakers grouped together.

Circuit breaker on PDU
Figure 1. Circuit breakers on PDU

The networking equipment in the Foundation Services Rack completely boots when power is applied. There is no need to manually power on these devices.

SambaNova uses networking equipment from other suppliers. See Third-party documentation.

3.3. Gracefully shutting down the Foundation Services Rack

The switches and Serial Console Server are stateless devices and should not be harmed by unexpected shutdowns. However, to prevent the devices from logging unexpected power downs and showing that information when you power back up, it’s best to power off any devide from the GUI or CLI if available.

Power off the Juniper EX4300 switch last. Access to some other devices requires that the switch is working.
  1. Shut down the Juniper QFX5130 high-bandwidth data switch, the Lantronix SLC8000 serial console server, and the Juniper EX4300 access.

  2. When you power down the entire Foundation Services Rack, shut down the Juniper EX4300 access switch last, because that switch provides the network access to the system via the 1GbE network.

    See the product-specific documentation listed under Third-party documentation for information on how to shut down each of these network devices.

After shutting down these switches, you can no longer access the PDUs to cycle outlets because their network switch is down. You have to break and manually remake the relevant breakers from the physical PDU to properly cycle power. Alternatively, you can access the PDUs via their serial ports and use their CLIs.

4. Monitor and debug the Foundation Services Rack

The Foundation Services Rack supports standard methods for monitoring and triaging the system. This page includes some tasks you can perform, such as examining log files, and also explains how to collect diagnostic materials for use with SambaNova support.

If you cannot resolve the issues yourself, create a support case and include diagnostic materials.

4.1. Set up SNMP alerts

To configure SNMP alerts for non-SambaNova components in the Foundation Services Rack, see the vendor-specific documentation.

4.2. Debug third-party components

For operational issues with the third-party components in the Foundation Services Rack, see the vendor-specific documentation. For issues that require additional support or for questions related to troubleshooting, open a support case through SambaNova Support. See KB article #1017, "SambaNova Systems Support Best Practices," at https://support.sambanova.ai.

Do not open a case directly with the product vendor.

4.3. Collect diagnostic materials for SambaNova Support

When you open a support case, provide details on the issue that has occurred, and initial diagnostic materials. For collecting diagnostic materials, see the following KB articles in the SambaNova Support portal:

  • Ethernet Data Switch Diagnostic Collection: KB Article #1053

  • Access Switch Diagnostic Collection: KB article #1053

  • Serial Console Server Diagnostic Collection: KB article #1121

  • PDU Diagnostic Collection: KB article #1120

5. Back up and restore components

Use your site-specific guidelines and tools for backing up and restoring components of the Foundation Services Rack.

If you change the standard configuration of the networking equipment that is shipped to you, save the configuration changes you make to the devices. For details, see the KB articles listed below. Customers can access KB articles in the SambaNova Support portal at https://support.sambanova.ai.

5.1. Recover the Juniper access and data switch

For the process to recover the Juniper access switch and data switch, see the following KB articles:

  • Juniper Switch Password Recovery: KB article #1056

  • Juniper Switch Factory Reset Recovery: KB article #1056

  • Juniper Switch Saving Running Configuration: KB article #1056

5.2. Recover the Latronix serial console server

For the process to recover the Lantronix serial console server, including recovering the sysadmin password, see the following KB articles:

  • Lantronix Serial Console Server Password Recovery: KB article #1059

  • Lantronix Serial Console Server Factory Reset Recovery: KB article #1059

  • Lantronix Serial Console Server Saving Running Configuration: KB article #1059

5.3. Upload recovery configuration files

For the process to upload configuration files used as part of the recovery process for some of these components, see the following KB articles:

  • Uploading Configuration Files for Recovery: KB article #1055

  • Listing and Downloading Configuration Files for Recovery: KB article #1044

For questions concerning any of these recovery KB articles or for anything that is not covered here, open a support case through the SambaNova Support portal (https://support.sambanova.ai).