Skip to content

Releases: huggingface/optimum-habana

v1.10.2: Patch release

18 Feb 02:23
Compare
Choose a tag to compare

Upgrade to Transformers v4.37

  • Upgrade to Transformers 4.37 #651

Full Changelog: v1.10.0...v1.10.2

v1.10: SDXL, Textual-Inversion, TRL, SynapseAI v1.14

30 Jan 21:50
Compare
Choose a tag to compare

SynapseAI v1.14

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.14.0.

Stable Diffusion XL

SDXL is now supported and optimized for Gaudi.

Textual inversion fine-tuning

An example of textual-inversion fine-tuning has been added.

TRL

The 🤗 TRL library is now supported on Gaudi for performing DPO and SFT.

  • Add DPO and SFT of TRL support in Gaudi and example #601
  • Restructure example/trl/stack_llama_2 for generic DPO #635 @libinta
  • Add DPO of TRL in README.md #652 @libinta
  • Add seed in DPO for reproduce the training result #646 @sywangyi

Full bf16 evaluation

Full bf16 evaluation inside the trainer can now be performed like in Transformers.

Text-generation pipeline

A text-generation pipeline fully optimized for Gaudi has been added.

Model optimizations

TGI

TGI on Gaudi has been moved to a dedicated repo: https://github.com/huggingface/tgi-gaudi

Various fixes

Others

v1.9: Llama2-70B, Falcon-180B, Mistral, fp8, SynapseAI v1.13

04 Dec 14:36
Compare
Choose a tag to compare

SynapseAI v1.13

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.13.

Fine-tuning Llama2-70B, Falcon-180B and BLOOM-7B

Added examples for fine-tuning Llama2-70B and Falcon-180B on Gaudi2 and BLOOM-7B on first-gen Gaudi.

  • Enable llama2-70b LoRA finetuning #527 @mandy-li
  • Add Deepspeed zero3 configuration to run bloom-7b on Gaudi1 #487
  • Enable Falcon 180B #537 @hlahkar

Llama2 fp8 inference

Mistral

Optimizations

  • Remove GPTJ dma before mha #468 @BaihuiJin
  • Enable llama attention softmax in bf16 #521 @schoi-habana
  • Add load_meta_device option to reduce host RAM #529 @jiminha
  • Improve llama performance and reduce memory consumption by updating sin/cos cache when inferring more than max position embeddings (4096) #532 @puneeshkhanna
  • Add hash_with_views arg for Falcon inference perf #534 @schoi-habana
  • Automate skip_hash_with_views for text generation with Falcon #544 @regisss

Improved text generation

Support for Transformers v4.34 and Diffusers v0.23

This version has been validated for Transformers v4.34 and Diffusers v0.23.

TGI

Dynamic shape support

  • Add infra to enable/disable dynamic shapes feature through gaudi_config #513 @vivekgoe

Habana Mixed Precision was removed in favor of Torch Autocast

Various fixes

Others

The regression tests associated to this release are here: https://github.com/huggingface/optimum-habana/actions/runs/7085551714

V1.8.1: Patch release

02 Nov 10:31
Compare
Choose a tag to compare

Add a constraint on the Transformers dependency to make sure future versions are not installed.

Full Changelog: v1.8.0...v1.8.1

v1.8: BART, bucketing for text generation, SD upscaler, SynapseAI v1.12 and many model optimizations

20 Oct 09:39
Compare
Choose a tag to compare

BART for inference

  • Enable BartForConditionalGeneration inference with greedy search #274 @bhargaveede

Bucketing for text generation

  • growing bucket optimization for greedy text generation #417 @ssarkar2

Stable Diffusion x4 upscaler

SynapseAI v1.12

Various model optimizations

TGI

Check min version in examples

A utility method was added to ensure that the latest version of Optimum Habana is installed to run the examples.

Others

Various fixes

Regression tests for this release are available here: https://github.com/huggingface/optimum-habana/actions/runs/6580186897

v1.7.5: Patch Release

14 Sep 14:26
Compare
Choose a tag to compare

Fix a bug due a changing import in Diffusers.

Full Changelog: v1.7.4...v1.7.5

v1.7.4: Patch Release

12 Sep 18:35
Compare
Choose a tag to compare

Fix a bug where DeepSpeed ZeRO-3 was not working.

Full Changelog: v1.7.3...v1.7.4

v1.7.2: Patch release

24 Aug 20:22
Compare
Choose a tag to compare

Upgrade to Accelerate v0.22.0 to fix a bug with distributed runs.

Full Changelog: v1.7.1...v1.7.2

v1.7.1: Patch release

23 Aug 10:41
Compare
Choose a tag to compare

Upgrade to Transformers v4.32.0 to fix a bug with Llama.

Full Changelog: v1.7.0...v1.7.1

v1.7: Llama 2, Falcon, LoRA, Transformers v4.31, SynapseAI v1.11

17 Aug 11:20
Compare
Choose a tag to compare

Transformers v4.31

Transformers v4.31 (latest stable release) is fully supported.

SynapseAI v1.11

SynapseAI v1.11 (latest stable release) is fully supported.

Optimizations for Llama 2, Falcon, StarCoder, OPT, GPT-NeoX, CodeGen

Torch Autocast

⚠️ Habana Mixed Precision is deprecated and will be removed in SynapseAI v1.12.
Torch Autocast is becoming the default for managing mixed-precision runs.

Improved text-generation example

LoRA examples

Two new LoRA examples for fine-tuning and inference.

LDM3D

New Stable Diffusion pipeline that enables to generate images and depth maps.

Added support for Text Generation Inference (TGI)

TGI is now supported on Gaudi.

GaudiGenerationConfig

Transformers' GenerationConfig has been extended to be fully compatible with Gaudi. It adds two fields to better control generation with static shapes.

Various fixes and improvements