Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content. #3054

Open
rankaiyx opened this issue Dec 1, 2024 · 1 comment
Open

[Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content. #3054

rankaiyx opened this issue Dec 1, 2024 · 1 comment
Labels
bug Confirmed bugs

Comments

@rankaiyx
Copy link

rankaiyx commented Dec 1, 2024

🐛 Bug

When I use the following model, it will repeatedly output "<pad><pad><pad><pad>..."
HF://mlc-ai/gemma-2-27b-it-q4f16_1-MLC

To Reproduce

Steps to reproduce the behavior:

  1. In a conda environment, install mlc-llm:
    python -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu123 mlc-ai-nightly-cu123

  2. mlc_llm chat --overrides "tensor_parallel_shards=4" HF://mlc-ai/gemma-2-27b-it-q4f16_1-MLC
    or
    mlc_llm serve --overrides "tensor_parallel_shards=4" HF://mlc-ai/gemma-2-27b-it-q4f16_1-MLC

Expected behavior

Output the correct content.

Environment

  • Platform : CUDA
  • Operating system : Ubuntu 24.04
  • Device : 4x tesla P100
  • How you installed MLC-LLM (conda, source): conda
  • How you installed TVM-Unity (pip, source):pip
  • Python version (e.g. 3.10): 3.11
  • GPU driver version (if applicable): 565.57.01
  • CUDA/cuDNN version (if applicable): CUDA 12.4 (The version that is relied on by mlc-llm and therefore installed automatically.)
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): e6b2a55d1e1668d889ce69efa3921bc73dcb8b8a
@rankaiyx rankaiyx added the bug Confirmed bugs label Dec 1, 2024
@rankaiyx rankaiyx changed the title [Bug] [Bug] gemma-2-27b-it-q4f16_1-MLC output the correct content. Dec 1, 2024
@rankaiyx rankaiyx changed the title [Bug] gemma-2-27b-it-q4f16_1-MLC output the correct content. [Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content. Dec 1, 2024
@MasterJH5574
Copy link
Member

Thank you @rankaiyx for bringing up this issue! We tried to dig a bit. The potential reason is that gemma2-27b will produce inf during the computation if we use float16 as the activation dtype, while it works well with dtype bfloat16. We have not enabled bfloat16 support in MLC and we will think about how to address this problem.

A similar issue we found: oobabooga/text-generation-webui#6213

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

2 participants