SambaStack Logs - SambaNova Documentation

This page describes the logging-based telemetry emitted by the platform. These are log events, not Prometheus metrics, and are intended for detailed debugging, performance analysis, and forensics. Log events are typically ingested into a log backend such as OpenSearch or Loki and queried via Grafana or a similar tool.

RDU manifest events

RDU manifest events are structured logs emitted per request by the model runtime. They contain token counts, high-level latencies, and a set of detailed timing fields. These events are typically indexed into a log index (for example, an OpenSearch index) and can be filtered by fields such as model, tenant, pod, and time range.

RDU manifest fields

Field	Category	Key	Description
Prompt tokens	Tokens	`prompt_tokens_count`	Number of input tokens in the prompt.
Completion tokens	Tokens	`completion_tokens_count`	Number of output tokens generated.
Total latency	Latency	`total_latency`	End-to-end time from request start to last token (includes queue + compute).
Time to first token (TTFT)	Latency	`time_to_first_token`	Time from request submission to first token.
Completion tokens per second	Throughput	`completion_tokens_per_sec`	Effective throughput over the entire completion.
Tokens/sec after first token	Throughput	`completion_tokens_after_first_per_sec`	Decode throughput after first token (steady-state).
Acceptance rate	Spec decoding	`acceptance_rate`	Acceptance rate for speculative decoding.
Decode queue time	Internal timing	`decode_queue_time`	Time spent in decode-related queues (e.g. continuous batching queues).
Tensor transfer time	Internal timing	`tensor_transfer_time`	Time spent transferring tensors between components.
Cache transfer time	Internal timing	`cache_transfer_time`	Time spent transferring cache (e.g. KV cache).

These fields are logging events and may be subject to schema evolution.

Example queries

Examples assume a log backend that supports a query language (e.g., OpenSearch or Loki) and timestamps on each event. p95 total latency per model (last 15 minutes)

Filter: model:"<model_name>" AND @timestamp:[now-15m TO now]
Aggregate: percentile 95 on total_latency grouped by model.

TTFT vs total latency comparison

Group by model, compute p50/p90 for both time_to_first_token and total_latency.

Decode queue time hotspots

Filter: decode_queue_time > <threshold>
Group by model or tenant to identify where queueing is highest.

Speculative decoding acceptance rate

Filter: model:"<model_name>"
Aggregate: average acceptance_rate over time.

Monitoring and Observability – High-level telemetry breakdown and hierarchy.
Metrics – Router-level metrics reference.

Overview

Installation

Service Administration

Hardware Administration

Reference Architectures

Resources

Logs

RDU manifest events

RDU manifest fields

Example queries

Overview

Installation

Service Administration

Hardware Administration

Reference Architectures

Resources

​RDU manifest events

​RDU manifest fields

​Example queries

​Related topics

RDU manifest events

RDU manifest fields

Example queries

Related topics