Release GPTQModel v1.7.2 · ModelCloud/GPTQModel

What's Changed

⚡Effective BPW (bits per weight) will now be logged during load().
⚡Reduce loading time on Intel Arc A770/B580 XPU by 3.3x.
⚡Reduce memory usage in MLX conversion.
🐛 Fix Marlin kernel auto-select not checking CUDA compute version.

remove catching module error by @CSY-ModelCloud in #1088
[FIX] monkey patch GPTQShuffle.convert_idx to use fixed convert_idx by @LRL-ModelCloud in #1090
[FIX] monkey patch only once by @LRL-ModelCloud in #1091
check CC >= 8 for marlin, fixed #1092 by @CSY-ModelCloud in #1093
check compute capability for marlin in validate_device() by @CSY-ModelCloud in #1095
torch get device with index of CUDA_VISIBLE_DEVICES, not value of it by @CSY-ModelCloud in #1096
fix local model path & marlin test by @CSY-ModelCloud in #1097
mod bits info by @CL-ModelCloud in #1100
Reduce memory usage in mlx conversion by @Qubitium in #1099
cleanup mlx code by @Qubitium in #1101

Full Changelog: v1.7.0...v1.7.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v1.7.2

What's Changed

Contributors