What's Changed
⚡Effective BPW (bits per weight) will now be logged during load().
⚡Reduce loading time on Intel Arc A770/B580 XPU by 3.3x.
⚡Reduce memory usage in MLX conversion.
🐛 Fix Marlin kernel auto-select not checking CUDA compute version.
- remove catching module error by @CSY-ModelCloud in #1088
- [FIX] monkey patch GPTQShuffle.convert_idx to use fixed convert_idx by @LRL-ModelCloud in #1090
- [FIX] monkey patch only once by @LRL-ModelCloud in #1091
- check CC >= 8 for marlin, fixed #1092 by @CSY-ModelCloud in #1093
- check compute capability for marlin in validate_device() by @CSY-ModelCloud in #1095
- torch get device with index of CUDA_VISIBLE_DEVICES, not value of it by @CSY-ModelCloud in #1096
- fix local model path & marlin test by @CSY-ModelCloud in #1097
- mod bits info by @CL-ModelCloud in #1100
- Reduce memory usage in mlx conversion by @Qubitium in #1099
- cleanup mlx code by @Qubitium in #1101
Full Changelog: v1.7.0...v1.7.2