int4 quant broken right now? #217

jerryzh168 · 2024-12-20T04:17:00Z

I tried the following and seems it breaks right now

> python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4 --groupsize 64
Loading model ...
Quantizing model weights for int4 weight-only affine per-channel groupwise quantization
linear: layers.0.attention.wqkv, in=4096, out=6144
Traceback (most recent call last):
  File "/data/users/jerryzh/gpt-fast/quantize.py", line 622, in <module>
    quantize(args.checkpoint_path, args.mode, args.groupsize, args.calibration_tasks, args.calibration_limit, args.calibration_seq_length, args.pad_calibration_inputs, args.percdamp, args.blocksize, args.label)
  File "/data/users/jerryzh/gpt-fast/quantize.py", line 569, in quantize
    quantized_state_dict = quant_handler.create_quantized_state_dict()
  File "/home/jerryzh/.conda/envs/sglang/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/data/users/jerryzh/gpt-fast/quantize.py", line 433, in create_quantized_state_dict
    weight_int4pack, scales_and_zeros = prepare_int4_weight_and_scales_and_zeros(
  File "/data/users/jerryzh/gpt-fast/quantize.py", line 363, in prepare_int4_weight_and_scales_and_zeros
    weight_int4pack = torch.ops.aten._convert_weight_to_int4pack(weight_int32, inner_k_tiles)
  File "/home/jerryzh/.conda/envs/sglang/lib/python3.10/site-packages/torch/_ops.py", line 1123, in __call__
    return self._op(*args, **(kwargs or {}))
RuntimeError: Expected in.dtype() == at::kByte to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

it's probably because of @yanbing-j's recent refactors, but I'm not sure if we want to migrate to use torchao's quant at some point so not sure if it's worth fixing now.

The text was updated successfully, but these errors were encountered:

jerryzh168 · 2024-12-20T04:18:59Z

I can quantize with #208 though, so probably landing that is sufficient

yanbing-j · 2024-12-20T05:08:05Z

@jerryzh168 I have a PR before #187, which should fix int4 api mismatch of cuda device.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int4 quant broken right now? #217

int4 quant broken right now? #217

jerryzh168 commented Dec 20, 2024

jerryzh168 commented Dec 20, 2024

yanbing-j commented Dec 20, 2024

int4 quant broken right now? #217

int4 quant broken right now? #217

Comments

jerryzh168 commented Dec 20, 2024

jerryzh168 commented Dec 20, 2024

yanbing-j commented Dec 20, 2024