v1.14.0: Transformers v4.45, SynapseAI v1.18, Qwen2-MoE, text-to-video generation
Transformers v4.45
SynapseAI v1.18
Qwen2-MoE
Text-to-video generation
- Enabling Text to Video Diffusion Model Generation #1109 @pi314ever
- Porting Stable Video Diffusion ControNet to HPU #1037 @wenbinc-Bin
Depth-to-image generation
- Depth to Image Generation #1175 @pi314ever
Model optimizations
- Enable FusedSDPA for Mpt #1101 @Jianhong-Zhang
- Mixtral fp8 #1269 @imangohari1
- Prevent Graph break in Llama when using flash attention #1301 @pramodkumar-habanalabs
- Boost SDXL speed with initialized schedule step reset #1284 @dsocek
- Improve MPT fp8 #1256 @atakaha
- Add Whisper static generation #1275 @Spycsh
- Gemma: enabled HPU Graphs and Flash Attention #1173 @dsmertin
- Recommend jemalloc for gpt-neox-20b 8x #1350 @hsubramony
- Optimized inference of GPT-NEO model on HPU #1319 @XinyuYe-Intel
- Fix graph breaks for BART in torch.compile mode. #1379 @astachowiczhabana
- Gpt_bigcode: added internal_bucketing support #1218 @mgonchar
- refine bucket_internal for mpt #1194 @Jing1Ling
- Qwen finetuning bucketing #1130 @ssarkar2
- Enable FusedSDPA fp8 in Llama FT #1388 @pbielak
- Added gemma specific fp8 quantization file #1445 @yeonsily
Intel Neural Compressor
- Enable INC for llava models and change softmax to use torch.nn.functional.softmax as its supported module by INC #1325 @tthakkal
- Load INC GPTQ checkpoint & rename params #1364 @HolyFalafel
- Fix load INC load weights compile error due to Transformer 4.45 upgrade. #1421 @jiminha
Vera/LN-tuning
Other
- Add callable workflow to post comments when code quality check failed #1263 @regisss
- Fix failed code quality check comment workflow #1264 @regisss
- Accelerate Diffusers CI #1265 @regisss
- Add profiler to SD3 #1267 @atakaha
- Fix profiling step with device finish execution for text-generation #1283 @libinta
- Update FusedSDPA calling method as Gaudi documentation #1285 @yeonsily
- Switch failed code quality check comment to workflow_run #1297 @regisss
- Potential fix for the failed code quality check comment workflow #1299 @regisss
- Fix text-generation example lm_eval evaluation #1308 @changwangss
- Add section to README about Transformers development branch #1307 @regisss
- Fix eager mode in run_generation by removing graph logs #1231 @Vasud-ha
- Fix bug when running google/paligemma-3b-mix-224 #1279 @kaixuanliu
- Use native checkpointing under compile mode #1313 @xinyu-intel
- fixed fused_qkv object AttributeError due to 'LlamaConfig' #1203 @rkumar2patel
- Image to Image Generation Enabling #1196 @pi314ever
- Diffusers timing #1277 @imangohari1
- Fix eos issue in finetune/generation #1253 @sywangyi
- Update CI, tests and examples #1315 @regisss
- Fix Sentence Transformer HPU graphs for training with PEFT model #1320 @nngokhale
- Fix ZeroDivisionError in constrained beam search with static shapes #1317 @skavulya
- Update esmfold model not to use param_buffer_assignment #1324 @jiminha
- Falcon inference crash fix for falcon-40b model #1161 @yeonsily
- Add --use_kv_cache to image-to-text pipeline #1292 @KimBioInfoStudio
- Trl upgrade #1245 @sywangyi
- Fix uint4 url typo. #1340 @kding1
- Use eager attention for wav2vec2 #1333 @skaulintel
- Add _reorder_cache back to Llama for HPU #1233 @jiminha
- SDXL CI script throughput #1296 @imangohari1
- Add image so that transformers tests can run #1338 @skaulintel
- Fixes the no attribute error with the falcon multicard test #1344 @mounikamandava
- Add profiler to sdxl mlperf pipeline #1339 @Jianhong-Zhang
- Fix decoder only generation #948 @tjs-intel
- Upgrade gradient chekpointing #1347 @yafshar
- Run_generation example: fixed graph compilation statistics reporting #1352 @mgonchar
- Fix deepseeed crash with Sentence Transformer Trainer #1328 @nngokhale
- fea(ci): reduced slow test_diffusers timing. minor fixes #1330 @imangohari1
- Flash attn args for GaudiGemmaForCausalLM #1356 @kkoryun
- Transformer models generation supports user-provided input embeddings #1276 @zongwave
- Fixed the expected values after for img2img slice #1332 @imangohari1
- Gpt_big_code: make flash attention impl quantization friendly #1282 @mgonchar
- Fix OOM when inference with llama-3.1-70b #1302 @harborn
- Fix the conditional #1362 @yafshar
- Revert "use native checkpointing under compile mode" #1365 @xinyu-intel
- Remove repetitive pip install commands #1367 @MohitIntel
- Minor UX enhancement #1373 @MohitIntel
- Fix bug when running image-to-text example #1371 @kaixuanliu
- Gpt_bigcode: fixed wrong indentation #1376 @mgonchar
- Support for transformers without self.model to torch.compile #1380 @astachowiczhabana
- Only pass the use_kv_cache True to generator #1366 @yafshar
- Clean up the code and remove unnecessary class #1382 @yafshar
- Add the diffusers examples of inference Tech #1244 @yuanwu2017
- Enhance transformers test suite in Optimum-habana-4.43.4 Auto pr 07654de #1387 @rkumar2patel
- Enhance transformers test suite in Optimum-habana-4.43.4 (auto PR 8926a4b) #1386 @rkumar2patel
- Add README.md for Sentence transformer examples with HPU device #1355 @ZhengHongming888
- Change Falcon/GPT-Neox rotary embedding function to use seq_len for #1368 @yeonsily
- Enhance Optimum-habana as per transformers-4.43.4 #1381 @rkumar2patel
- CI fix - Install stable-diffusion reqs #1389 @vidyasiv
- Fix error caused by uninitialized attn_weights #1391 @hsubramony
- Replace flash attention flag #1393 @skaulintel
- Fix DeepSpeed CI on Gaudi2 #1395 @regisss
- Truncate the cached max seq len #1394 @astachowiczhabana
- Fix gpt-neox training accuracy issue. #1397 @yeonsily
- Simplify HQT config files #1219 @Tiefen-boop
- unify_measurements.py script support to unify PCQ 70B 8x #1322 @Yantom1
- Add misc. training args #1346 @SanityRemnants
- Add quantization config for low bs case #1377 @ulivne
- Remove HQT from OHF #1257 @Yantom1
- Valid sequence length for sdpa #1183 @ssarkar2
- Multiple fixes (dynamo graph break, qwen-moe, multicard) #1410 @ssarkar2
- Change the image path for transformers tests back to the correct location #1401 @skaulintel
- Fix Gaudi2 regression tests #1403 @regisss
- Reverting some of transformer pytest funcs/values #1399 @imangohari1
- Fix StarCoder2 inference #1405 @regisss
- Change the order for test_diffusers #1406 @hsubramony
- Fix llama model text generation error #1402 @zongwave
- Datasets downgrade version to 2.21.0 #1413 @hsubramony
- Update ci sentence_transformer.sh #1424 @ZhengHongming888
- Update language-modeling README.md, add trust_remote_code for flan-t5-xl #1422 @hsubramony
- Update unify_measurements.py support info #1425 @shepark
- Fix GPT_neox incorrect output with batch query #1358 @Jianhong-Zhang
- Fix text-to-image example #1429 @regisss
- Add flag to run inference with partial dataset #1420 @pramodkumar-habanalabs
- Add peft generation example #1427 @sywangyi
- Added missing allocate_kv_cache() call in CausalLM class #1431 @yeonsily
- Fix merge error and update text-to-speech readme #1436 @hsubramony
- Fix OOM error for code llama #1437 @jiminha
- Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 #1439 @jiminha
- GPT2 torch.compile fix #1434 @dsmertin
- Update text-gen README.md to add auto-gptq fork install steps #1442 @hsubramony
- Fix scoped linear all-reduce for starcoder model #1432 @skavulya
- Fixed recursion error in SentenceTransformer #1428 @yafshar
- Fix Llama 3.1 generation #1444 @regisss
- Remove cache folder from image data folder #1446 @shepark