Transition to DataScale SN30
The DataScale SN30 system offers significantly improved performance over DataScale SN10 system. Because the system is different, you likely have to recompile and retrain a model that was compiled on SN10 system. This topic gives some guidance.
A PEF built on SN10 is not expected to run unmodified on an SN30. |
General RDU difference information
Here are the RDU differences between SN10 and SN30.
SN10 | SN30 |
---|---|
8 RDUs |
8RDUs |
4 tiles per RDU |
8 tiles per RDU |
Default compile yields a PEF that uses 1 RDU (4 tiles) |
Default compile yields a PEF that uses 1 RDU (8 tiles) |
By default you run using 1 copy of the model on 1 RDU |
By default you run 2 copies of the model, one on each "half" of the RDU (using tensor parallel execution). |
Compiler impacts
RDU differences mean that the compiler optimizes the PEF file differently. Here’s what you need to know:
-
On both SN10 or SN30 you can explicitly specify the number of tiles with
num-tiles
, for example,--num-tiles=4
.-
If you compile with
--num-tiles=4
on an SN10 system, you can run 8 instances of data-parallel on a node. -
If you compile with
--num-tiles 4
on an SN30 system, you can run 16 instances of data-parallel on a node.
-
-
If you specify
--num-chips=1
on SN10 or SN30 you get 4 tiles. -
Because SN30 uses tensor parallel, both compile and run operations require that the batch size be an even number. The results are reduced using data parallel in the PEF. This is the default, it is equivalent to
--tensor-parallel=batch
. -
It is not unusual to need to use different human decision files and different compiler configuration files when migrating your model from SN10 to SN30.