Transition to DataScale SN30

The DataScale SN30 system offers significantly improved performance over DataScale SN10 system. Because the system is different, you likely have to recompile and retrain a model that was compiled on SN10 system. This topic gives some guidance.

A PEF built on SN10 is not expected to run unmodified on an SN30.

General RDU difference information

Here are the RDU differences between SN10 and SN30.

SN10 SN30

8 RDUs

8RDUs

4 tiles per RDU

8 tiles per RDU

Default compile yields a PEF that uses 1 RDU (4 tiles)

Default compile yields a PEF that uses 1 RDU (8 tiles)

By default you run using 1 copy of the model on 1 RDU

By default you run 2 copies of the model, one on each "half" of the RDU (using tensor parallel execution).

Compiler impacts

RDU differences mean that the compiler optimizes the PEF file differently. Here’s what you need to know:

  • On both SN10 or SN30 you can explicitly specify the number of tiles with num-tiles, for example, --num-tiles=4.

    • If you compile with --num-tiles=4 on an SN10 system, you can run 8 instances of data-parallel on a node.

    • If you compile with --num-tiles 4 on an SN30 system, you can run 16 instances of data-parallel on a node.

  • If you specify --num-chips=1 on SN10 or SN30 you get 4 tiles.

  • Because SN30 uses tensor parallel, both compile and run operations require that the batch size be an even number. The results are reduced using data parallel in the PEF. This is the default, it is equivalent to --tensor-parallel=batch.

  • It is not unusual to need to use different human decision files and different compiler configuration files when migrating your model from SN10 to SN30.

PEF information

For information on resource requirements for your PEF, for example, how many tiles are required, use the /opt/sambaflow/slurm/python/slurmfeeder utility.