Stage latency report

A section is split into stages. This report provides per-stage diagnostics - it lists the time taken by each stage. A stage is often the equivalent of an ML graph operator, but not always so. A stage may be an intermediate buffer inserted by the compiler mid-end or backend, or multiple operators may be fused into one stage. Stages execute as a pipeline, and therefore, the longest latency stage is often the critical stage.

The stage latency report can help you identify bottlenecks at the stage level and you can check slowest stages in a section.

Find the report

The report is available:

In .XLSX format at /reports/collated_report.xslx in your output folder in the 'Stage Latency' worksheet. See View the tabular report.
As a standalone CSV at reports/stage_report.csv in your output folder.

Read the output data

The report returns the following information.

Column Name Description

Column Name	Description
section id	Unique ID associated with each section.
stage depth	Number of stages between this stage and the start of the section.
stage id	Unique id associated with each stage. If set to -1, this row either represents a inter-stage buffer that is not assigned a stage_id, or a bug/oversight in the stage_id assignment code.
related stage ids	Related stage ids are displayed if the stage id is set to -1, or any stage latency counter associated with a stage has a different stage id.
non-buffer template names	List of all templates in the stage that are not buffers.
mac id	List of mac ids associated with the stage template information.
nodes (kNames)	The kNames (the name at the lower stack) for each of the templates in the stage (including buffers).
nodes (NodeName)	Node names (the name at the lower stack) for each of the templates in the stage (including buffers).
measured latency	Measured latency (in cycles) of a stage, based on the reading of the instrumentation counters divided by the number of iterations.
all measured latencies	If there’s more than one instrumentation counter with the same stage id, the report buckets those counters together and reports only the counter with the lowest measured stage latency in the `measured_latency` column. Every other measured latency with that stage id will be reported in this column.
tile id	Unique ID associated with each tile.
chip id	Unique ID associated with each chip.
event name	Event name associated the instrumentation counter
in → out buffers	List of all input buffers in the stage

section id

Unique ID associated with each section.

stage depth

Number of stages between this stage and the start of the section.

stage id

Unique id associated with each stage. If set to -1, this row either represents a inter-stage buffer that is not assigned a stage_id, or a bug/oversight in the stage_id assignment code.

related stage ids

Related stage ids are displayed if the stage id is set to -1, or any stage latency counter associated with a stage has a different stage id.

non-buffer template names

List of all templates in the stage that are not buffers.

mac id

List of mac ids associated with the stage template information.

nodes (kNames)

The kNames (the name at the lower stack) for each of the templates in the stage (including buffers).

nodes (NodeName)

Node names (the name at the lower stack) for each of the templates in the stage (including buffers).

measured latency

Measured latency (in cycles) of a stage, based on the reading of the instrumentation counters divided by the number of iterations.

all measured latencies

If there’s more than one instrumentation counter with the same stage id, the report buckets those counters together and reports only the counter with the lowest measured stage latency in the measured_latency column. Every other measured latency with that stage id will be reported in this column.

tile id

Unique ID associated with each tile.

chip id

Unique ID associated with each chip.

event name

Event name associated the instrumentation counter

in → out buffers

List of all input buffers in the stage

Interpret the data

The stage latency bar charts can be helpful in identifying the longest stage, which is often, though not always, the critical stage in the section. You can then troubleshoot bottlenecks.

This section will have more information in a future release.

View the XLSX report

When you view the report, you can select the tabs at the bottom to drill down. Pay attention to the color coding:

The row will be formatted orange if stage-id is missing. Stage-id is said to missing if its value is -1.
If measured latency is within 10% of critical latency, measured latency cell is formatted yellow. Critical latency formula is: (rdu_clock_speed * microbatch_size) / section_throughput

View the tabular report

The tabular form of the stage latencies allows you to sort, search and filter aspects of stage latency. For example, you can sort the latencies in descending order or look for stages with latencies greater than a certain threshold value. Here’s a screenshot of an example in the GUI client.

Example screenshots

The SambaTune Web UI allows you to explore the stage latency for each section executed on the RDU.