[Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content. #3054

rankaiyx · 2024-12-01T20:04:45Z

🐛 Bug

When I use the following model, it will repeatedly output "<pad><pad><pad><pad>..."
HF://mlc-ai/gemma-2-27b-it-q4f16_1-MLC

To Reproduce

Steps to reproduce the behavior:

In a conda environment, install mlc-llm:
python -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu123 mlc-ai-nightly-cu123
mlc_llm chat --overrides "tensor_parallel_shards=4" HF://mlc-ai/gemma-2-27b-it-q4f16_1-MLC
or
mlc_llm serve --overrides "tensor_parallel_shards=4" HF://mlc-ai/gemma-2-27b-it-q4f16_1-MLC

Expected behavior

Output the correct content.

Environment

Platform : CUDA
Operating system : Ubuntu 24.04
Device : 4x tesla P100
How you installed MLC-LLM (conda, source): conda
How you installed TVM-Unity (pip, source):pip
Python version (e.g. 3.10): 3.11
GPU driver version (if applicable): 565.57.01
CUDA/cuDNN version (if applicable): CUDA 12.4 (The version that is relied on by mlc-llm and therefore installed automatically.)
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): e6b2a55d1e1668d889ce69efa3921bc73dcb8b8a

The text was updated successfully, but these errors were encountered:

MasterJH5574 · 2024-12-05T20:52:47Z

Thank you @rankaiyx for bringing up this issue! We tried to dig a bit. The potential reason is that gemma2-27b will produce inf during the computation if we use float16 as the activation dtype, while it works well with dtype bfloat16. We have not enabled bfloat16 support in MLC and we will think about how to address this problem.

A similar issue we found: oobabooga/text-generation-webui#6213

rankaiyx added the bug Confirmed bugs label Dec 1, 2024

rankaiyx changed the title ~~[Bug]~~ [Bug] gemma-2-27b-it-q4f16_1-MLC output the correct content. Dec 1, 2024

rankaiyx changed the title ~~[Bug] gemma-2-27b-it-q4f16_1-MLC output the correct content.~~ [Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content. Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content. #3054

[Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content. #3054

rankaiyx commented Dec 1, 2024 •

edited

Loading

MasterJH5574 commented Dec 5, 2024

[Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content. #3054

[Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content. #3054

Comments

rankaiyx commented Dec 1, 2024 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

MasterJH5574 commented Dec 5, 2024

rankaiyx commented Dec 1, 2024 •

edited

Loading