Skip to content

Releases: ModelCloud/GPTQModel

GPTQModel v1.5.1

01 Jan 08:39
4f18747
Compare
Choose a tag to compare

What's Changed

🎉 2025!

⚡ Added QuantizeConfig.device to clearly define which device is used for quantization: default = auto. Non-quantized models are always loaded on cpu by-default and each layer is moved to QuantizeConfig.device during quantization to minimize vram usage.
💫 Improve QuantLinear selection from optimum.
🐛 Fix attn_implementation_autoset compat in latest transformers.

Full Changelog: v1.5.0...v1.5.1

GPTQModel v1.5.0

24 Dec 02:01
4197cd8
Compare
Choose a tag to compare

What's Changed

⚡ Multi-modal (image-to-text) optimized quantization support has been added for Qwen 2-VL and Ovis 1.6-VL. Previous image-to-text model quantizations did not use image calibration data, resulting in less than optimal post-quantization results. Version 1.5.0 is the first release to provide a stable path for multi-modal quantization: only text layers are quantized.
🐛 Fixed Qwen 2-VL model quantization vram usage and post-quant file copy of relevant config files.
🐛 Fixed install/compilations in envs with wrong TORCH_CUDA_ARCH_LIST set (Nvidia docker images)
🐛 Warn about bad torch[cuda] install on Windows

Full Changelog: v1.4.5...v1.5.0

GPTQModel v1.4.5

19 Dec 12:16
9012892
Compare
Choose a tag to compare

What's Changed

⚡ Windows 11 support added/validated with DynamicCuda and Torch kernels.
⚡ Ovis 1.6 VL model support with image data calibration.
⚡ Reduced quantization vram usage.
🐛 Fixed dynamic controlled layer loading logic

Full Changelog: v1.4.4...v1.4.5

GPTQModel v1.4.4 Patch

17 Dec 14:48
92266fa
Compare
Choose a tag to compare

What's Changed

⚡ Reduced memory usage during quantization
⚡ Fix device_map={"":"auto"} compat

Full Changelog: v1.4.2...v1.4.4

GPTQModel v1.4.2

16 Dec 15:44
7b4ae93
Compare
Choose a tag to compare

What's Changed

⚡ MacOS gpu (MPS) + cpu inference and quantization support
⚡ Added Cohere 2 model support

Full Changelog: v1.4.1...v1.4.2

GPTQModel v1.4.1

13 Dec 16:52
11ca9a1
Compare
Choose a tag to compare

What's Changed

⚡ Added Qwen2-VL model support.
mse quantization control exposed in QuantizeConfig
⚡ New GPTQModel.patch_hf() and GPTQModel.patch_vllm() monkey patch api to allow Transformers/Optimum/Peft to use GPTQModel while upstream PRs are pending.
⚡ New GPTQModel.patch_vllm() monkey patch api to allow vLLM to correctly load dynamic/mixed gptq quantized models.

Full Changelog: v1.4.0...v1.4.1

GPTQModel v1.4.0

10 Dec 15:35
360a8e6
Compare
Choose a tag to compare

What's Changed

EvalPlus harness integration merged upstream. We now support both lm-eval and EvalPlus.
⚡ Added pure torch Torch kernel.
⚡ Refactored Cuda kernel to be DynamicCuda kernel.
Triton kernel now auto-padded for max model support.
Dynamic quantization now supports both positive +::default, and -: negative matching which allows matched modules to be skipped entirely for quantization.
⚡Added auto-kernel fallback for unsupported kernel/module pairs.
🐛 Fixed auto-Marlin kernel selection.
🗑 Deprecated the saving of Marlin weight format. Marlin allows auto conversion of gptq format to Marlin during runtime. gptq format allows max kernel flexibility including Marlin kernel support.

Lots of internal refractor and cleanup in-preparation for transformers/optimum/peft upstream PR merge.

Full Changelog: v1.3.1...v1.4.0

GPTQModel v1.3.1

29 Nov 04:10
e7f1437
Compare
Choose a tag to compare

What's Changed

⚡ Olmo2 model support.
⚡ Intel XPU acceleration via IPEX.
Sharding compat fix due to api deprecation in HF Transformers.
Removed triton dependency. Triton kernel now optionally dependent on triton pkg.
Fixed Hymba Test (Hymba requires desc_act=False)

Full Changelog: v1.3.0...v1.3.1

GPTQModel v1.3.0

26 Nov 19:13
8819f51
Compare
Choose a tag to compare

What's Changed

Zero-Day Hymba model support added. Removed tqdm and rogue depends.

Full Changelog: v1.2.3...v1.3.0

GPTQModel v1.2.3

25 Nov 09:05
2b498d8
Compare
Choose a tag to compare

Stable release with all feature and model unit tests passing. Fixed lots of model unit tests that did not pass or passed incorrectly in previous releases.

HF GLM support added. GLM/ChatGLM has two different code forks: one if non-hf integrated, and latest one is integrated into transformers. HF GLM and non-HF GLM are not weight compatible and we support both variants.

What's Changed

New Contributors

Full Changelog: v1.2.1...v1.2.3