SambaTune Release Notes

For detailed release-dependent information for code examples, check the README and configuration files in /opt/sambaflow/sambatune/configs.

  • The README.md file discusses compatible architectures and modes.

  • A preinstalled sambanova-runtime package is required to run sambatune.

  • Each configuration file lists the required application-related packages.

Release 1.19

Features

  • Deprecated the following reports:

    • Bandwidth report

    • Paths report

    • Stalls report

  • Removed some package dependencies. It is recommended that you check if the SambaFlow and SambaNova Runtime packages are installed on the machine before installing the sambanova-sambatune package.

Bug Fixes

  • Fixed error related "User requested sambatune run on "sn30" architecture whereas attached RDU is "SN30" architecture" . Now users can input the architecture in either upper case or lower case.

Release 1.18

Features

  • Added tensor parallel test case.

  • Added section type details display in full-stack-tracing tool.

  • Improved the logger to dump more information for debugging.

  • Improved the instrumentation workflows.

Bug Fixes

  • Fixed O/E ratio in full stack tracing.

Release 1.17 (2023-10-20)

Features

  • Added GPT13B examples for SN20 and SN30.

  • Added README to the config folder, which includes number of sockets, supported modes, and supported architectures for each config.

Bug Fixes

  • Fixed some corner cases where the full stack tracing report was not generated.

  • Updated the SambaTune documentation link to the latest version.

Release 1.16 (2023-07-14)

Features

  • Added full stack tracing capability. See Stack Tracing report.

  • Added the ability to profile apps that do training using actual run command. See unet_e2e.yaml example below.

  • Added the ability to separate compile and run apps in YAML inputs. See unet_e2e.yaml example below.

  • Improvements to reports:

    • Improved collated report output to consolidate with other reports.

    • Improved section report output to support hypersection.

    • Improved DDR and PCIe reports to support hypersection.

    • Improved performance insights to support hypersection.

Bug Fixes

  • Fixed model sweep and samba sweep failure when mac_resources.json outputted empty or arch was not specified.

Deprecations

  • Deprecated --modes run for linear_net_compare.yaml.

unet_e2e.yaml example

This unet_e2e.yaml example illustrates some changes we made in release 1.16.

# This example assumes that the sambaflow-apps-datascale-image-segmentation package is installed.
app: /opt/sambaflow/apps/image/segmentation/compile.py

compile-args: >
  --batch-size 64
  --mac-v2
  ...

run-app: /opt/sambaflow/apps/image/segmentation/hook.py

run-args: >
  --num-workers 4
  --data-dir $DATA_DIR
  --log-dir $LOG_DIR
  ...

env:
  DATA_DIR: DATA_DIR
  LOG_DIR: LOG_DIR
# This example assumes that the sambaflow-apps-datascale-image-segmentation package is installed.
app: /opt/sambaflow/apps/image/segmentation/compile.py

compile-args: >
  --batch-size 64
  --in-channels=3
  --in-width=256
  --in-height=256
  --enable-conv-tiling
  --mac-v2
  --mac-human-decision $UNET_HD

run-app: /opt/sambaflow/apps/image/segmentation/hook.py

run-args: >
  --data-transform-config /opt/sambaflow/apps/image/segmentation/segmentation/datasets/data_transforms_config.yaml
  --num-workers 4
  --mode train
  --in-channels=3
  --in-width=256
  --in-height=256
  --init-features 32
  --batch-size 64
  --epochs 1
  --data-dir $DATA_DIR
  --data-cache-dir $DATA_CACHE_DIR
  --log-dir $LOG_DIR
  --samba-runtime-profile
  --run-benchmark
  --benchmark-steps 10
  --benchmark-warmup-steps 2

env:
  UNET_HD: UNET_HD
  DATA_DIR: DATA_DIR
  DATA_CACHE_DIR: DATA_CACHE_DIR
  LOG_DIR: LOG_DIR