Skip to main content
SambaStack emits two primary telemetry surfaces to help you observe and operate your deployments:
  • Metrics (Prometheus) – Router-level metrics such as traffic, latency, queueing, and worker state. See Metrics.
  • Logs (Logging Events / Manifest Events) – Detailed per-request execution events from the model runtime and other services. See Logs.
Telemetry data generally consists of three types: Metrics (numeric time series for aggregation and alerting), Logs (discrete events for debugging and forensics), and Traces (request path records for latency analysis and root cause investigation).

Monitoring stack overview

SambaStack includes data and metadata designed to help operators observe and troubleshoot all aspects of their AI inference workloads running on SambaNova racks. The monitoring stack is responsible for collecting, storing, and visualizing:
  • System and application logs (control plane and data plane)
  • Audit events and access traces
  • Usage metrics (QPS, latency, queue time, memory utilization)
  • User activity (active users, sessions)
  • Health and availability signals (node status, pod status, model health)

Reference architecture

The reference architecture described here is SambaNova’s suggested implementation, but is completely optional. SambaNova provides an example of a default monitoring stack based on widely used open-source tools. Many customers already have mature monitoring solutions. The SambaStack monitoring architecture is modular, so you can:
  • Adopt the full stack as provided, or
  • Swap individual components with equivalents from your existing observability platform (Splunk, Datadog, Elasticsearch, New Relic, etc.).
Reference architectures are constructed using numerous third-party products. There is no guarantee that they will be updated in lock step with version or command syntax changes of those third-party products. Any errors not directly applicable to SambaStack should be addressed to the vendor of the component having the issue.

Components

SambaStack’s reference monitoring stack uses four primary components:
ComponentToolDescription
Log Forwarder and ProcessorFluent BitCollects logs from Kubernetes (pods, nodes, system services). Parses, enriches, and forwards logs to a log backend (e.g., OpenSearch or your existing log platform).
Log Storage and SearchOpenSearchStores logs, audit trails, and structured events at scale. Provides search, filtering, aggregation, and dashboards for log data. Acts as the canonical source of truth for log and audit history in the reference architecture.
Metrics Collection and AlertingPrometheusScrapes metrics from SambaStack services, Kubernetes components, and node exporters. Stores time-series metrics for performance, capacity, and health monitoring. Serves as the primary source for alerting rules (through Prometheus or Alertmanager).
Dashboards and VisualizationGrafanaConnects to Prometheus and OpenSearch (or your equivalents). Provides pre-built dashboards and can integrate with your SSO/IdP for role-based access to monitoring views.

Design principles

The reference architecture is designed around a few core principles:
  • Modular integration – Each component exposes well-defined interfaces (e.g., Fluent Bit outputs, Prometheus remote_write, Grafana data sources). You can replace any component with an equivalent that provides the same interface.
  • Kubernetes-native – All components are designed to run on, integrate with, or observe your Kubernetes cluster(s) where SambaStack workloads are deployed.
  • Bring-your-own stack friendly – If you already have:
    • A centralized log platform → integrate Fluent Bit outputs with it.
    • A metrics/TSDB system → use Prometheus as a scrape endpoint or replace it with your own collector.
    • An existing visualization layer → connect it directly to OpenSearch/Prometheus or replace Grafana entirely.
  • Security and compliance ready – Logging, metrics, and audit data can be integrated with your existing SIEM and compliance tooling.

Component substitution

You can replace any reference architecture component with an equivalent from your existing observability platform.
ComponentReplaceable withRequirements
Log Storage and Search (OpenSearch)Elasticsearch, Splunk, Loki, Datadog Logs, or your SIEMMust accept logs over a protocol supported by Fluent Bit (HTTP, gRPC, Kafka, etc.) and support your retention and compliance needs.
Log Forwarder and Processor (Fluent Bit)Fluentd, Vector, Logstash, Datadog AgentMust support Kubernetes log collection and be able to send logs to your selected log storage platform.
Metrics Collection (Prometheus)Managed Prometheus services, Datadog/New Relic agents, internal TSDBsMust be able to scrape or receive /metrics endpoints or accept Prometheus remote_write.
Dashboards and Visualization (Grafana)Datadog dashboards, Kibana, custom internal toolingMust integrate with both metrics and log sources to offer equivalent visibility.