[BUG] Quantized and model is generating repeating response #991

BodhiHu · 2025-01-02T08:33:42Z

Describe the bug

Hi,

After quantizing the model, it's generating repeating response.

Below is the convert and test script:

from datasets import load_dataset
from transformers import AutoTokenizer
from gptqmodel import GPTQModel, QuantizeConfig

model_id = "/path/to/LLaMA-MoE_8B-2_8-sft"
quant_path = "/path/to/LLaMA-MoE_8B-2_8-sft-GPTQ-w4g128"

tokenizer = AutoTokenizer.from_pretrained(model_id)

calibration_dataset = [
  tokenizer(example["text"])
  for example in load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
  ).select(range(1024))
]

quant_config = QuantizeConfig(bits=4, group_size=128, desc_act=False)

model = GPTQModel.load(model_id, quant_config)

model.quantize(calibration_dataset)

model.save(quant_path)

model = GPTQModel.load(quant_path)

result = model.generate(
  **tokenizer(
      "Good Morning! Once upon a time, there's a company called", return_tensors="pt"
  ).to(model.device)
)[0]

print(f"\n{tokenizer.decode(result[0], skip_special_tokens=True)}\n")

Quantized model output: (which is the same as input)

Good Morning! Once upon a time, there's a company called

GPU Info

Using CPU: x86_64

Software Info

Ubuntu 22.04.4 LTS + Python 3.12.8

Show output of:

pip show gptqmodel torch transformers accelerate triton

Name: gptqmodel
Version: 1.4.6.dev0
Summary: A LLM quantization package with user-friendly apis. Based on GPTQ algorithm.
Home-page: https://github.com/ModelCloud/GPTQModel
Author: ModelCloud
Author-email: [email protected]
License:
Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages
Requires: accelerate, datasets, device-smi, numpy, packaging, pillow, protobuf, safetensors, sentencepiece, threadpoolctl, torch, transformers
Required-by:
---
Name: torch
Version: 2.5.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3-Clause
Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, setuptools, sympy, triton, typing-extensions
Required-by: accelerate, gptqmodel
---
Name: transformers
Version: 4.47.1
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [email protected]
License: Apache 2.0 License
Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: gptqmodel
---
Name: accelerate
Version: 1.2.1
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: [email protected]
License: Apache
Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: gptqmodel
---
Name: triton
Version: 3.1.0
Summary: A language and compiler for custom Deep Learning operations
Home-page: https://github.com/triton-lang/triton/
Author: Philippe Tillet
Author-email: [email protected]
License:
Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages
Requires: filelock
Required-by: torch

Model/Datasets

Model: https://huggingface.co/llama-moe/LLaMA-MoE-v2-3_8B-2_8-sft
Dataset: allenai/c4

The text was updated successfully, but these errors were encountered:

Qubitium · 2025-01-02T11:42:23Z

After quantizing the model, it's generating repeating response.

Can you post samples of the repeating response?

BodhiHu added the bug Something isn't working label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Quantized and model is generating repeating response #991

[BUG] Quantized and model is generating repeating response #991

BodhiHu commented Jan 2, 2025 •

edited

Loading

Qubitium commented Jan 2, 2025

[BUG] Quantized and model is generating repeating response #991

[BUG] Quantized and model is generating repeating response #991

Comments

BodhiHu commented Jan 2, 2025 • edited Loading

Qubitium commented Jan 2, 2025

BodhiHu commented Jan 2, 2025 •

edited

Loading