You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUDA/cuDNN version (if applicable): CUDA 12.4 (The version that is relied on by mlc-llm and therefore installed automatically.)
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): e6b2a55d1e1668d889ce69efa3921bc73dcb8b8a
The text was updated successfully, but these errors were encountered:
rankaiyx
changed the title
[Bug]
[Bug] gemma-2-27b-it-q4f16_1-MLC output the correct content.
Dec 1, 2024
rankaiyx
changed the title
[Bug] gemma-2-27b-it-q4f16_1-MLC output the correct content.
[Bug] gemma-2-27b-it-q4f16_1-MLC output the incorrect content.
Dec 1, 2024
Thank you @rankaiyx for bringing up this issue! We tried to dig a bit. The potential reason is that gemma2-27b will produce inf during the computation if we use float16 as the activation dtype, while it works well with dtype bfloat16. We have not enabled bfloat16 support in MLC and we will think about how to address this problem.
🐛 Bug
When I use the following model, it will repeatedly output "<pad><pad><pad><pad>..."
HF://mlc-ai/gemma-2-27b-it-q4f16_1-MLC
To Reproduce
Steps to reproduce the behavior:
In a conda environment, install mlc-llm:
python -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu123 mlc-ai-nightly-cu123
mlc_llm chat --overrides "tensor_parallel_shards=4" HF://mlc-ai/gemma-2-27b-it-q4f16_1-MLC
or
mlc_llm serve --overrides "tensor_parallel_shards=4" HF://mlc-ai/gemma-2-27b-it-q4f16_1-MLC
Expected behavior
Output the correct content.
Environment
conda
, source): condapip
, source):pippython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models): e6b2a55d1e1668d889ce69efa3921bc73dcb8b8aThe text was updated successfully, but these errors were encountered: