SambaStack Monitoring and Observability

Monitoring stack overview

SambaStack emits two primary telemetry surfaces to help you observe and operate your deployments:

Metrics (Prometheus) – Router-level metrics such as traffic, latency, queueing, and worker state. See Metrics.
Logs (Logging Events / Manifest Events) – Detailed per-request execution events from the model runtime and other services. See Logs.

Telemetry data generally consists of three types: Metrics (numeric time series for aggregation and alerting), Logs (discrete events for debugging and forensics), and Traces (request path records for latency analysis and root cause investigation).

Monitoring stack overview

SambaStack includes data and metadata designed to help operators observe and troubleshoot all aspects of their AI inference workloads running on SambaNova racks. The monitoring stack is responsible for collecting, storing, and visualizing:

System and application logs (control plane and data plane)
Audit events and access traces
Usage metrics (QPS, latency, queue time, memory utilization)
User activity (active users, sessions)
Health and availability signals (node status, pod status, model health)

See the Reference architecture section for monitoring stack architecture and deployment details.

User management Metrics

Overview

Installation

Service Administration

Hardware Administration

Reference Architectures

Resources

Monitoring and Observability

Monitoring stack overview

Overview

Installation

Service Administration

Hardware Administration

Reference Architectures

Resources

​Monitoring stack overview

Monitoring stack overview