Intel® Neural Compressor v2.4.1 Release
- Improvement
- Bug Fixes
- Examples
- Validated Configurations
Improvement
- Narrow down the tuning space of SmoothQuant auto-tune (9600e1)
- Support ONNXRT Weight-Only Quantization with different dtypes (5119fc)
- Add progress bar for ONNXRT Weight-Only Quantization and SmoothQuant (4d26e3)
Bug Fixes
- Fix SmoothQuant alpha-space generation (33ece9)
- Fix inputs error for SmoothQuant example_inputs (39f63a)
- Fix LLMs accuracy regression with IPEX 2.1.100 (3cb6d3)
- Fix quantizable add ops detection on IPEX backend (4c004d)
- Fix range step bug in ORTSmoothQuant (40275c)
- Fix unit test bugs and update CI versions (6c78df, 835805)
- Fix notebook issues (08221e)
Examples
- Add verified LLMs list and recipes for SmoothQuant and Weight-Only Quantization (f19cc9)
- Add code-generaion evaluation for Weight-Only Quantization GPTQ (763440)
Validated Configurations
- Centos 8.4 & Ubuntu 22.04
- Python 3.10
- TensorFlow 2.14
- ITEX 2.14.0.1
- PyTorch/IPEX 2.1.0
- ONNX Runtime 1.16.3