- Metrics (Prometheus) – Router-level metrics such as traffic, latency, queueing, and worker state. See Metrics.
- Logs (Logging Events / Manifest Events) – Detailed per-request execution events from the model runtime and other services. See Logs.
Telemetry data generally consists of three types: Metrics (numeric time series for aggregation and alerting), Logs (discrete events for debugging and forensics), and Traces (request path records for latency analysis and root cause investigation).
Monitoring stack overview
SambaStack includes data and metadata designed to help operators observe and troubleshoot all aspects of their AI inference workloads running on SambaNova racks. The monitoring stack is responsible for collecting, storing, and visualizing:- System and application logs (control plane and data plane)
- Audit events and access traces
- Usage metrics (QPS, latency, queue time, memory utilization)
- User activity (active users, sessions)
- Health and availability signals (node status, pod status, model health)
See the Reference architecture section for monitoring stack architecture and deployment details.
