Releases · ModelCloud/GPTQModel

01 Jan 08:39

Qubitium

v1.5.1

4f18747

GPTQModel v1.5.1 Latest

Latest

What's Changed

🎉 2025!

⚡ Added QuantizeConfig.device to clearly define which device is used for quantization: default = auto. Non-quantized models are always loaded on cpu by-default and each layer is moved to QuantizeConfig.device during quantization to minimize vram usage.
💫 Improve QuantLinear selection from optimum.
🐛 Fix attn_implementation_autoset compat in latest transformers.

Add QuantizeConfig.device and use. by @Qubitium in #950
fix hf_select_quant_linear by @LRL-ModelCloud in #966
update vllm gptq_marlin code by @ZX-ModelCloud in #967
fix cuda:0 not a enum device by @CSY-ModelCloud in #968
fix marlin info for non-cuda device by @Qubitium in #972
fix backend str bug by @CL-ModelCloud in #973
hf select quant_linear with pack by @LRL-ModelCloud in #969
remove auto select BACKEND.IPEX by @CSY-ModelCloud in #975
fix autoround received a device_map by @CSY-ModelCloud in #976
use enum instead of magic number by @CSY-ModelCloud in #979
use new ci docker images by @CSY-ModelCloud in #980
fix flash attntion was auto loaded on cpu for pretrained model by @CSY-ModelCloud in #981
fix old transformer doesn't have _attn_implementation_autoset by @CSY-ModelCloud in #982
fix gptbigcode test temporally by @CSY-ModelCloud in #983
fix version parsing by @CSY-ModelCloud in #985

Full Changelog: v1.5.0...v1.5.1

Contributors

Qubitium, ZX-ModelCloud, and 3 other contributors

Assets 52

gptqmodel-1.5.1+cu118torch2.0-cp310-cp310-linux_x86_64.whl

32.4 MB 2025-01-01T09:18:36Z
gptqmodel-1.5.1+cu118torch2.0-cp311-cp311-linux_x86_64.whl

32.4 MB 2025-01-01T09:21:19Z
gptqmodel-1.5.1+cu118torch2.0-cp39-cp39-linux_x86_64.whl

32.4 MB 2025-01-01T09:20:45Z
gptqmodel-1.5.1+cu118torch2.1-cp310-cp310-linux_x86_64.whl

32.4 MB 2025-01-01T09:17:53Z
gptqmodel-1.5.1+cu118torch2.1-cp311-cp311-linux_x86_64.whl

32.5 MB 2025-01-01T09:17:32Z
gptqmodel-1.5.1+cu118torch2.1-cp39-cp39-linux_x86_64.whl

32.4 MB 2025-01-01T09:17:16Z
gptqmodel-1.5.1+cu118torch2.2-cp310-cp310-linux_x86_64.whl

32.2 MB 2025-01-01T09:49:39Z
gptqmodel-1.5.1+cu118torch2.2-cp311-cp311-linux_x86_64.whl

32.3 MB 2025-01-01T09:49:04Z
gptqmodel-1.5.1+cu118torch2.2-cp312-cp312-linux_x86_64.whl

32.3 MB 2025-01-01T09:50:11Z
gptqmodel-1.5.1+cu118torch2.2-cp39-cp39-linux_x86_64.whl

32.2 MB 2025-01-01T09:15:23Z
Source code (zip)

2025-01-01T04:55:58Z
Source code (tar.gz)

2025-01-01T04:55:58Z

24 Dec 02:01

Qubitium

v1.5.0

4197cd8

GPTQModel v1.5.0

What's Changed

⚡ Multi-modal (image-to-text) optimized quantization support has been added for Qwen 2-VL and Ovis 1.6-VL. Previous image-to-text model quantizations did not use image calibration data, resulting in less than optimal post-quantization results. Version 1.5.0 is the first release to provide a stable path for multi-modal quantization: only text layers are quantized.
🐛 Fixed Qwen 2-VL model quantization vram usage and post-quant file copy of relevant config files.
🐛 Fixed install/compilations in envs with wrong TORCH_CUDA_ARCH_LIST set (Nvidia docker images)
🐛 Warn about bad torch[cuda] install on Windows

Fix backend not ipex by @CSY-ModelCloud in #930
Fix broken ipex check by @Qubitium in #933
Fix dynamic_cuda validation by @CSY-ModelCloud in #936
Fix bdist_wheel does not exist on old setuptools by @CSY-ModelCloud in #939
Add cuda warning on windows by @CSY-ModelCloud in #942
Add torch inference benchmark by @CL-ModelCloud in #940
Add modality to BaseModel by @ZX-ModelCloud in #937
[FIX] qwen_vl_utils should be locally import by @ZX-ModelCloud in #946
Filter torch cuda arch < 6.0 by @CSY-ModelCloud in #955
[FIX] wrong filepath was used when model_id_or_path was hugging model id by @ZX-ModelCloud in #956
Fix import error was not caught by @CSY-ModelCloud in #961

Full Changelog: v1.4.5...v1.5.0

Contributors

Qubitium, ZX-ModelCloud, and 2 other contributors

Assets 52

19 Dec 12:16

Qubitium

v1.4.5

9012892

GPTQModel v1.4.5

What's Changed

⚡ Windows 11 support added/validated with DynamicCuda and Torch kernels.
⚡ Ovis 1.6 VL model support with image data calibration.
⚡ Reduced quantization vram usage.
🐛 Fixed dynamic controlled layer loading logic

Refractor by @Qubitium in #895
Add platform check by @LRL-ModelCloud in #899
Exclude marlin & exllama on windows by @CSY-ModelCloud in #898
Remove unnecessary backslash in the expression & typehint by @CSY-ModelCloud in #903
Add DEVICE.ALL by @LRL-ModelCloud in #901
[FIX] the error of loading quantized model with dynamic by @ZX-ModelCloud in #907
[FIX] gpt2 quantize error by @ZX-ModelCloud in #912
Simplify checking generated str for vllm test & fix transformers version for cohere2 by @CSY-ModelCloud in #914
[MODEL] add OVIS support by @ZX-ModelCloud in #685
Fix IDE warning marlin not in all by @CSY-ModelCloud in #920

Full Changelog: v1.4.4...v1.4.5

Contributors

Qubitium, ZX-ModelCloud, and 2 other contributors

Assets 52

17 Dec 14:48

Qubitium

v1.4.4

92266fa

GPTQModel v1.4.4 Patch

What's Changed

⚡ Reduced memory usage during quantization
⚡ Fix device_map={"":"auto"} compat

Speed up unit tests by @Qubitium in #885
[FIX] hf select quant linear parse device map by @ZX-ModelCloud in #887
Avoid cloning on gpu by @Qubitium in #886
Expose hf_quantize() by @ZX-ModelCloud in #888
Update integration hf code by @ZX-ModelCloud in #891
Add back fasterquant() for compat by @Qubitium in #892

Full Changelog: v1.4.2...v1.4.4

Contributors

Qubitium and ZX-ModelCloud

Assets 52

16 Dec 15:44

Qubitium

v1.4.2

7b4ae93

GPTQModel v1.4.2

What's Changed

⚡ MacOS gpu (MPS) + cpu inference and quantization support
⚡ Added Cohere 2 model support

Build Changes by @Qubitium in #855
Fix MacOS support by @Qubitium in #861
check device_map on from_quantized() by @ZX-ModelCloud in #865
call patch for TestTransformersIntegration by @CSY-ModelCloud in #867
Add MacOS gpu acceleration via MPS by @Qubitium in #864
[MODEL] add cohere2 support by @CL-ModelCloud in #869
check device_map by @ZX-ModelCloud in #872
set PYTORCH_ENABLE_MPS_FALLBACK for macos by @CSY-ModelCloud in #873
check device_map int value by @ZX-ModelCloud in #876
Simplify by @Qubitium in #877
[FIX] device_map={"":None} by @ZX-ModelCloud in #878
set torch_dtype to float16 for XPU by @CSY-ModelCloud in #875
remove IPEX device check by @ZX-ModelCloud in #879
[FIX] call normalize_device() by @ZX-ModelCloud in #881
[FIX] get_best_device() wrong usage by @ZX-ModelCloud in #882

Full Changelog: v1.4.1...v1.4.2

Contributors

Qubitium, ZX-ModelCloud, and 2 other contributors

Assets 52

13 Dec 16:52

Qubitium

v1.4.1

11ca9a1

GPTQModel v1.4.1

What's Changed

⚡ Added Qwen2-VL model support.
⚡ mse quantization control exposed in QuantizeConfig
⚡ New GPTQModel.patch_hf() and GPTQModel.patch_vllm() monkey patch api to allow Transformers/Optimum/Peft to use GPTQModel while upstream PRs are pending.
⚡ New GPTQModel.patch_vllm() monkey patch api to allow vLLM to correctly load dynamic/mixed gptq quantized models.

Add warning for vllm/sglang when using dynamic feature by @CSY-ModelCloud in #810
Update Eval() usage sample by @CL-ModelCloud in #819
auto select best device by @CSY-ModelCloud in #822
Fix error msg by @CSY-ModelCloud in #823
allow pass meta_quantizer from save() by @CSY-ModelCloud in #824
Quantconfig add mse field by @CL-ModelCloud in #825
[MODEL] add qwen2_vl support by @LRL-ModelCloud in #826
check cuda when there's only cuda device by @CSY-ModelCloud in #830
Update lm-eval test by @CL-ModelCloud in #831
add patch_vllm() by @ZX-ModelCloud in #829
Monkey patch HF transformer/optimum/peft support by @CSY-ModelCloud in #818
auto patch vllm by @CSY-ModelCloud in #837
Fix lm-eval API BUG by @CL-ModelCloud in #838
[FIX] dynamic get "desc_act" error by @ZX-ModelCloud in #841
BaseModel add supports_desc_act by @ZX-ModelCloud in #842
[FIX] should local import patch_vllm() by @ZX-ModelCloud in #844
Mod vllm generate by @LRL-ModelCloud in #833
fix patch_vllm by @LRL-ModelCloud in #850

Full Changelog: v1.4.0...v1.4.1

Contributors

ZX-ModelCloud, LRL-ModelCloud, and 2 other contributors

Assets 13

10 Dec 15:35

Qubitium

v1.4.0

360a8e6

GPTQModel v1.4.0

What's Changed

⚡ EvalPlus harness integration merged upstream. We now support both lm-eval and EvalPlus.
⚡ Added pure torch Torch kernel.
⚡ Refactored Cuda kernel to be DynamicCuda kernel.
⚡ Triton kernel now auto-padded for max model support.
⚡ Dynamic quantization now supports both positive +::default, and -: negative matching which allows matched modules to be skipped entirely for quantization.
⚡Added auto-kernel fallback for unsupported kernel/module pairs.
🐛 Fixed auto-Marlin kernel selection.
🗑 Deprecated the saving of Marlin weight format. Marlin allows auto conversion of gptq format to Marlin during runtime. gptq format allows max kernel flexibility including Marlin kernel support.

Lots of internal refractor and cleanup in-preparation for transformers/optimum/peft upstream PR merge.

Remove Marlin old kernel and Marlin format saving. Marlin[new] is still supported via inference. by @CSY-ModelCloud in #714
Remove marlin(old) kernel codes & do ruff by @CSY-ModelCloud in #719
[FIX] gptq v2 load by @ZX-ModelCloud in #724
Add hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format,… by @LRL-ModelCloud in #727
if use the ipex quant linear, no need to convert by @LRL-ModelCloud in #730
hf_select_quant_linear add device_map by @LRL-ModelCloud in #732
Add TorchQuantLinear by @ZX-ModelCloud in #735
Add QUANT_TYPE in qlinear by @jiqing-feng in #736
Replace error with warning for Intel CPU check by @CSY-ModelCloud in #737
Add BACKEND.AUTO_CPU by @LRL-ModelCloud in #739
Fix ipex linear check by @jiqing-feng in #741
fFx select quant linear by @jiqing-feng in #742
Now meta.quantizer value can be an array by @ZX-ModelCloud in #744
Receive checkpoint_format argument by @ZX-ModelCloud in #747
Modify hf convert gptq v2 to v1 format by @ZX-ModelCloud in #749
update score max negative delta by @CSY-ModelCloud in #748
[CI] max parallel jobs 10 by @CSY-ModelCloud in #751
hymba got high score by @CSY-ModelCloud in #752
hf_select_quant_linear() always set pack=True by @ZX-ModelCloud in #754
Refractor CudaQuantLinear to DynamicCudaQuantLinear by @ZX-ModelCloud in #759
Remove filename prefix on qlinear dir by @ZX-ModelCloud in #760
Replace Nvidia-smi with devicesmi by @CSY-ModelCloud in #761
Fix XPU training by @jiqing-feng in #763
Fix auto marlin kernel selection by @CSY-ModelCloud in #765
Add BaseQuantLinear SUPPORTS_TRAINING declaration by @LRL-ModelCloud in #766
Add Eval() api to support LM-Eval or EvalPlus benchmark harnesses by @CL-ModelCloud in #750
Fix validate_device by @LRL-ModelCloud in #769
Force BaseQuantLinear properties to be explicitly declared by all QuantLinears by @ZX-ModelCloud in #767
Convert str backend to enum backend by @LRL-ModelCloud in #772
Remove nested list in dict by @CSY-ModelCloud in #774
Fix training qlinear by @LRL-ModelCloud in #777
Check kernel by @CSY-ModelCloud in #764
BACKEND.AUTO if backend is None by @LRL-ModelCloud in #781
Fix lm_head quantize test by @CSY-ModelCloud in #784
Fix exllama doesn't support 8 bit by @CSY-ModelCloud in #790
Use set() to avoid calling torch twice by @CSY-ModelCloud in #791
Fix ipex cpu backend import error and fix too much logs by @jiqing-feng in #793
Eval API opt by @CL-ModelCloud in #794
Fixed ipex linear param check and logging once by @jiqing-feng in #795
Check device before sync by @LRL-ModelCloud in #796
Only AUTO will try other quant linears by @CSY-ModelCloud in #797
Add SUPPORTS_AUTO_PADDING property to QuantLinear by @LRL-ModelCloud in #799
Dynamic now support skipping modules/layers by @CSY-ModelCloud in #804
Fix module was skipped but still be looped by @CSY-ModelCloud in #806
Make Triton kernel auto-pad on features/group_size by @LRL-ModelCloud in #808

Full Changelog: v1.3.1...v1.4.0

Contributors

jiqing-feng, ZX-ModelCloud, and 3 other contributors

Assets 52

29 Nov 04:10

Qubitium

v1.3.1

e7f1437

GPTQModel v1.3.1

What's Changed

⚡ Olmo2 model support.
⚡ Intel XPU acceleration via IPEX.
Sharding compat fix due to api deprecation in HF Transformers.
Removed triton dependency. Triton kernel now optionally dependent on triton pkg.
Fixed Hymba Test (Hymba requires desc_act=False)

[FIX] use split_torch_state_dict_into_shards to replace shard_checkpoint by @LRL-ModelCloud in #682
[Model] add olmo2 support by @LRL-ModelCloud in #678
[FIX] Hymba currently only supports a batch size of 1 by @ZX-ModelCloud in #683
[CI] fix extensions is not defined by @CSY-ModelCloud in #684
Ipex XPU support by @jiqing-feng in #608
[FIX] add require_pkgs_version and checks by @ZX-ModelCloud in #693
fix ipex test by @Qubitium in #691
[FIX] remove require_transformers_version and require_tokenizers_version by @ZX-ModelCloud in #695
Remove use_safetensors argument by @ZX-ModelCloud in #696
Revert exllamav1 by @CSY-ModelCloud in #692
Make Triton optional by @CSY-ModelCloud in #697
Unify backend use by @LRL-ModelCloud in #700
[FIX] fix test_hymba by @ZX-ModelCloud in #704
FIX IPEX XPU selection by @Qubitium in #705
fix cpu/xpu backend selection by @jiqing-feng in #706
Upgrade device-smi depend by @Qubitium in #708
[FIX] hymba quant needs desc_act=False by @ZX-ModelCloud in #710

Full Changelog: v1.3.0...v1.3.1

Contributors

Qubitium, jiqing-feng, and 3 other contributors

Assets 52

26 Nov 19:13

Qubitium

v1.3.0

8819f51

GPTQModel v1.3.0

What's Changed

Zero-Day Hymba model support added. Removed tqdm and rogue depends.

Move lm-eval to utils to make it optional, fixed #664 by @CSY-ModelCloud in #666
Add ipex bench code by @LRL-ModelCloud in #660
[MODEL] add hymba support by @LRL-ModelCloud in #651
[FIX] HymbaConfig.conv_dim keys is converted from str to int by @ZX-ModelCloud in #674
[FIX] progress first index starts from 1 instead of 0 by @ZX-ModelCloud in #673

Full Changelog: v1.2.3...v1.3.0

Contributors

ZX-ModelCloud, LRL-ModelCloud, and CSY-ModelCloud

Assets 52

25 Nov 09:05

Qubitium

v1.2.3

2b498d8

GPTQModel v1.2.3

Stable release with all feature and model unit tests passing. Fixed lots of model unit tests that did not pass or passed incorrectly in previous releases.

HF GLM support added. GLM/ChatGLM has two different code forks: one if non-hf integrated, and latest one is integrated into transformers. HF GLM and non-HF GLM are not weight compatible and we support both variants.

What's Changed

Add GLM (HF-ied) support by @Qubitium in #581
unit tests add args USE_VLLM by @ZYC-ModelCloud in #582
Quantize record info by @ZYC-ModelCloud in #583
[MISC] add gptqmodel[eval] and remove sentencepiece by @PZS-ModelCloud in #602
[MISC] requirements remove gekko, ninja, huggingface-hub, protobuf by @PZS-ModelCloud in #603
release gpu vram after layer.fwd by @LRL-ModelCloud in #616
Delete unsupported model & skip gptnoex by @CSY-ModelCloud in #617
[FIX] Some models put hidden_states in kwargs instead of args. by @ZX-ModelCloud in #621
lm_eval vllm task add max_model_len=4096 args by @LRL-ModelCloud in #625
try catch should only work with lmeval by @CSY-ModelCloud in #628
set USE_VLLM = False by @LRL-ModelCloud in #629
[FIX] if load quantized model. we will not monkey_path forward by @LRL-ModelCloud in #638
simplified ModelLoader ModelWriter func by @ZYC-ModelCloud in #637
disable chat for test_mpt by @CSY-ModelCloud in #641
Update unit_tests.yml by @Qubitium in #642
fix tokenized[0] wrong when getting value from BatchEncoding type by @CSY-ModelCloud in #643

New Contributors

@jiqing-feng made their first contribution in #527

Full Changelog: v1.2.1...v1.2.3

Contributors

Qubitium, jiqing-feng, and 5 other contributors

Assets 52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: ModelCloud/GPTQModel

GPTQModel v1.5.1

What's Changed

Contributors

GPTQModel v1.5.0

What's Changed

Contributors

GPTQModel v1.4.5

What's Changed

Contributors

GPTQModel v1.4.4 Patch

What's Changed

Contributors

GPTQModel v1.4.2

What's Changed

Contributors

GPTQModel v1.4.1

What's Changed

Contributors

GPTQModel v1.4.0

What's Changed

Contributors

GPTQModel v1.3.1

What's Changed

Contributors

GPTQModel v1.3.0

What's Changed

Contributors

GPTQModel v1.2.3

What's Changed

New Contributors

Contributors