Release Intel® Neural Compressor v2.6 Release · intel/neural-compressor

Highlights

Integrated recent AutoRound with lm-head quantization support and calibration process optimizations
Migrated ONNX model quantization capability into ONNX project Neural Compressor

Features

[Quantization] Integrate recent AutoRound with lm-head quantization support and calibration process optimizations (4728fd)
[Quantization] Support true sequential options in GPTQ (92c942)

Improvements

[Quantization] Improve WOQ Linear pack/unpack speed with numpy implementation (daa143)
[Quantization] Auto detect available device when exporting (7be355)
[Quantization] Refine AutoRound export to support Intel GPU (409231)
[Benchmarking] Detect the number of sockets when needed (e54b93)

Examples

Bug Fixes

Fix incorrect dtype of unpacked tensor issue in PT (29fdec)
Fix TF LLM SQ legacy Keras environment variable issue (276449)
Fix TF estimator issue by adding version check on TF2.16 (855b98)
Fix missing tokenizer issue in run_clm_no_trainer.py after using lm-eval 0.4.2 (d64029)
Fix AWQ padding issue in ORT (903da4)
Fix recover function issue in ORT (ee24db)
Update model ckpt download url in prepare_model.py (0ba573)
Fix case where pad_max_length set to None (960bd2)
Fix a failure for GPU backend (71a9f3)
Fix numpy versions for rnnt and 3d-unet examples (12b8f4)
Fix CVEs (5b5579) (25c71a) (47d73b) (41da74)

External Contributions

Validated Configurations

Provide feedback