GPTQModel v1.4.4 Patch
What's Changed
⚡ Reduced memory usage during quantization
⚡ Fix device_map={"":"auto"}
compat
- Speed up unit tests by @Qubitium in #885
- [FIX] hf select quant linear parse device map by @ZX-ModelCloud in #887
- Avoid cloning on gpu by @Qubitium in #886
- Expose hf_quantize() by @ZX-ModelCloud in #888
- Update integration hf code by @ZX-ModelCloud in #891
- Add back fasterquant() for compat by @Qubitium in #892
Full Changelog: v1.4.2...v1.4.4