We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug
Hi,
After quantizing the model, it's generating repeating response.
Below is the convert and test script:
from datasets import load_dataset from transformers import AutoTokenizer from gptqmodel import GPTQModel, QuantizeConfig model_id = "/path/to/LLaMA-MoE_8B-2_8-sft" quant_path = "/path/to/LLaMA-MoE_8B-2_8-sft-GPTQ-w4g128" tokenizer = AutoTokenizer.from_pretrained(model_id) calibration_dataset = [ tokenizer(example["text"]) for example in load_dataset( "allenai/c4", data_files="en/c4-train.00001-of-01024.json.gz", split="train" ).select(range(1024)) ] quant_config = QuantizeConfig(bits=4, group_size=128, desc_act=False) model = GPTQModel.load(model_id, quant_config) model.quantize(calibration_dataset) model.save(quant_path) model = GPTQModel.load(quant_path) result = model.generate( **tokenizer( "Good Morning! Once upon a time, there's a company called", return_tensors="pt" ).to(model.device) )[0] print(f"\n{tokenizer.decode(result[0], skip_special_tokens=True)}\n")
Quantized model output: (which is the same as input)
Good Morning! Once upon a time, there's a company called
GPU Info
Using CPU: x86_64
Software Info
Ubuntu 22.04.4 LTS + Python 3.12.8
Show output of:
pip show gptqmodel torch transformers accelerate triton Name: gptqmodel Version: 1.4.6.dev0 Summary: A LLM quantization package with user-friendly apis. Based on GPTQ algorithm. Home-page: https://github.com/ModelCloud/GPTQModel Author: ModelCloud Author-email: [email protected] License: Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages Requires: accelerate, datasets, device-smi, numpy, packaging, pillow, protobuf, safetensors, sentencepiece, threadpoolctl, torch, transformers Required-by: --- Name: torch Version: 2.5.1 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: [email protected] License: BSD-3-Clause Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, setuptools, sympy, triton, typing-extensions Required-by: accelerate, gptqmodel --- Name: transformers Version: 4.47.1 Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow Home-page: https://github.com/huggingface/transformers Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors) Author-email: [email protected] License: Apache 2.0 License Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm Required-by: gptqmodel --- Name: accelerate Version: 1.2.1 Summary: Accelerate Home-page: https://github.com/huggingface/accelerate Author: The HuggingFace team Author-email: [email protected] License: Apache Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch Required-by: gptqmodel --- Name: triton Version: 3.1.0 Summary: A language and compiler for custom Deep Learning operations Home-page: https://github.com/triton-lang/triton/ Author: Philippe Tillet Author-email: [email protected] License: Location: /home/huaishun/miniconda3/envs/GPTQModel/lib/python3.12/site-packages Requires: filelock Required-by: torch
Model/Datasets
Model: https://huggingface.co/llama-moe/LLaMA-MoE-v2-3_8B-2_8-sft Dataset: allenai/c4
The text was updated successfully, but these errors were encountered:
Can you post samples of the repeating response?
repeating
Sorry, something went wrong.
No branches or pull requests
Describe the bug
Hi,
After quantizing the model, it's generating repeating response.
Below is the convert and test script:
Quantized model output: (which is the same as input)
GPU Info
Using CPU: x86_64
Software Info
Ubuntu 22.04.4 LTS + Python 3.12.8
Show output of:
Model/Datasets
Model: https://huggingface.co/llama-moe/LLaMA-MoE-v2-3_8B-2_8-sft
Dataset: allenai/c4
The text was updated successfully, but these errors were encountered: