Inference router metrics
Inference router metrics describe queues, scheduling, and request lifecycle in the core inference layer.Inference router metrics table
| Metric | Category | Prometheus Name | Description | Granularity |
|---|---|---|---|---|
| Queue length | Queue | queue_length | Number of requests currently queued in the router. | Per model, QoS, and/or user |
| Max queue wait time | Queue | queue_max_wait_seconds | Maximum age (seconds) of any request currently in the queue. | Per model, QoS |
| Customer queue length | Queue | customer_queue_length | Queue length per customer per model. | Per user, model |
| Submitted requests | Traffic | submitted_total | Total number of requests submitted to the router. | Per model, QoS, user, status |
| Completed requests | Traffic | completed_total | Total number of completed requests, labeled with completion status (success, error, etc.). | Per model, QoS, user, status |
| Response codes | Traffic | response_code_total | Count of HTTP responses by status code. | Per HTTP code, route, user |
| Response latency | Latency | response_duration_ms | End-to-end response latency in milliseconds (often as a histogram or summary). | Per model, QoS, customer |
| Connection state | Workers | connection_state_ratio | Fraction of workers in each state (idle, busy, draining, unhealthy, etc.). | Per worker state, model, pool |
| Active users | Adoption | active_users | Number of active users observed by the router. | Global and/or per user |
Metric names and label sets may evolve over time. Refer to the release notes for changes in metric schema.
Related topics
- Monitoring and Observability – Conceptual overview and hierarchy.
- Logs – Log/manifest event schema and usage.
