Skip to content

v1.6: Fast DDP, Torch Autocast, SynaspeAI v1.10 and various model optimizations

Compare
Choose a tag to compare
@regisss regisss released this 26 Jun 09:41
· 975 commits to main since this release

Fast DDP

A new distribution strategy is introduced. It is lighter, simpler and usually faster than Torch DDP. You can enable it in your runs with --distribution_strategy fast_ddp.

Torch Autocast

It is now possible to use Torch Autocast as mixed precision backend. You can easily enable it in your runs with --bf16 (i.e. exactly like in Transformers).

SynapseAI v1.10

This release is fully compatible with SynapseAI v1.10.0.

HPU graphs for training

You can now use HPU graphs for training your models.

Check out the documentation for more information.

Various model optimizations

Asynchronous data copy

You can now enable asynchronous data copy between the host and devices during training using --non_blocking_data_copy.

  • Enable asynchronous data copy to get a better performance #211 @jychen-habana

Check out the documentation for more information.

Profiling

It is now possible to profile your training relying on GaudiTrainer. You will need to pass --profiling_steps N and --profiling_warmup_steps K.

Adjusted throughput calculation

You can now let the GaudiTrainer compute the real throughput of your run (i.e. not counting the time spent while logging, evaluating and saving the model) with --adjust_throughput.

  • Added an option to remove save checkpoint time from throughput calculation #237 @libinta

Check SynapseAI version at import

A check is performed when importing optimum.habana to let you know if you are running the version of SynapseAI for which Optimum Habana has been tested.

  • Check Synapse version when optimum.habana is used #225 @regisss

Enhanced examples

Several examples have been added or improved. You can find them here.

  • the text-generation example now supports sampling and beam search decoding, and full bf16 generation #218 #229 #238 #251 #258 #271
  • the contrastive image-text example now supports HPU-accelerated data loading #256
  • new Seq2Seq QA example #221
  • new protein folding example with ESMFold #235 #276