Section report

An ML graph is split into sections and each section is run on an RDU. This report provides per-section diagnostics - it lists the time taken by each section, the resources (PCU, PMU) used by each section and the bandwidth of data transferred in and out for each section (DDR, PCIe).

Locate the report

Output data in CSV format are in /reports/section_report.csv in your output directory.

Understand the output data

The complete report has multiple fields, each of which are explained below.

Column name Meaning

section_id

ID of the section the current section

chip_id

ID of the RDU on which the section executed

partition_id

The (section_id, chip_id) tuple

section type

Type of the section: Forward, Backward or Optimizer

measured throughput, samples/s

Throughput of the section measured while running the section

measured latency (cycles)

Latency in cycles of the section measured while running the section

measured latency (sec)

Latency in seconds of the section measured while running the section

measured PMU count

Actual number of PMUs allocated to the section

measured PMU utilization

Measure of PMUs used in the section as a percentage of PMUs available in the RDU

measured PCU count

Actual number of PCUs allocated to the section

measured PCU utilization

Measure of PCUs used in the section as a percenatge of PCUs available in the RDU

Interpret the data

Reading the section report can help you identify performance hotspots within and across sections. For example, if Section 4 shows the longest latency, consider troubleshooting other aspects of Section 4, and investigate if Section 4 was DDR Bandwidth bound, resource bound or simply too congested.

Example output

Here is an example report

section_id chip_id partition_id section type measured throughput, samples/s measured latency (cycles) measured latency (sec) measured PMU count measured PMU utilization measured PCU count measured PCU utilization

0

0

0, 0

Forward

1,248,679.50

0

0.000820

32

0.05

32

0.05

1

0

1, 0

Backward

1,065,453.58

0

0.000961

119

0.19

160

0.25

2

0

2, 0

Optimizer

3,422,697.17

0

0.000299

53

0.08

32

0.05

End2End

118,113.67

0

0.008670

Example screenshots

The SambaTune GUI shows the latency, PCU, PMU, DDR and PCIe bandwidth for every section executed on the RDU. Here are examples of reports available in the GUI:

Section Latencies
Section Report