Release v1.6: Fast DDP, Torch Autocast, SynaspeAI v1.10 and various model optimizations · huggingface/optimum-habana

Fast DDP

A new distribution strategy is introduced. It is lighter, simpler and usually faster than Torch DDP. You can enable it in your runs with --distribution_strategy fast_ddp.

Improve performance and scalability of BERT FT training #200 @mlapinski-habana

Torch Autocast

It is now possible to use Torch Autocast as mixed precision backend. You can easily enable it in your runs with --bf16 (i.e. exactly like in Transformers).

Enable usage of PyTorch autocast on Gaudi during training #226 @jwieczorekhabana
Add Torch autocast and full bf16 to GaudiStableDiffusionPipeline #278 @regisss

SynapseAI v1.10

This release is fully compatible with SynapseAI v1.10.0.

Upgrade to SynapseAI v1.10.0 #255 @regisss

HPU graphs for training

You can now use HPU graphs for training your models.

Improve performance and scalability of BERT FT training #200 @mlapinski-habana

Check out the documentation for more information.

Various model optimizations

Update BLOOM modeling for SynapseAI 1.10 #277
Optimize conv1d forward #231 @ZhaiFeiyue
Add static key-value cache for OPT, GPT-J, GPT-NeoX #246 #248 #249 @ZhaiFeiyue
Optimizations for running FLAN T5 with DeepSpeed ZeRO-3 #257 @libinta

Asynchronous data copy

You can now enable asynchronous data copy between the host and devices during training using --non_blocking_data_copy.

Enable asynchronous data copy to get a better performance #211 @jychen-habana

Check out the documentation for more information.

Profiling

It is now possible to profile your training relying on GaudiTrainer. You will need to pass --profiling_steps N and --profiling_warmup_steps K.

Enable profiling #250 @ZhaiFeiyue

Adjusted throughput calculation

You can now let the GaudiTrainer compute the real throughput of your run (i.e. not counting the time spent while logging, evaluating and saving the model) with --adjust_throughput.

Added an option to remove save checkpoint time from throughput calculation #237 @libinta

Check SynapseAI version at import

A check is performed when importing optimum.habana to let you know if you are running the version of SynapseAI for which Optimum Habana has been tested.

Check Synapse version when optimum.habana is used #225 @regisss

Enhanced examples

Several examples have been added or improved. You can find them here.

the text-generation example now supports sampling and beam search decoding, and full bf16 generation #218 #229 #238 #251 #258 #271
the contrastive image-text example now supports HPU-accelerated data loading #256
new Seq2Seq QA example #221
new protein folding example with ESMFold #235 #276

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.6: Fast DDP, Torch Autocast, SynaspeAI v1.10 and various model optimizations