Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Inference with llava throws an error #3039

Open
HoiM opened this issue Nov 21, 2024 · 2 comments
Open

[Bug] Inference with llava throws an error #3039

HoiM opened this issue Nov 21, 2024 · 2 comments
Labels
bug Confirmed bugs

Comments

@HoiM
Copy link

HoiM commented Nov 21, 2024

🐛 Bug

I am trying to run llava with mlc-llm. On both a linux server machine and a local MacOS, I encountered this error:

(run export RUST_BACKTRACE=full before running the inference program.)

[2024-11-21 14:48:31] INFO auto_device.py:88: Not found device: cuda:0
[2024-11-21 14:48:31] INFO auto_device.py:88: Not found device: rocm:0
[2024-11-21 14:48:32] INFO auto_device.py:79: Found device: metal:0
[2024-11-21 14:48:32] INFO auto_device.py:88: Not found device: vulkan:0
[2024-11-21 14:48:33] INFO auto_device.py:88: Not found device: opencl:0
[2024-11-21 14:48:33] INFO auto_device.py:35: Using device: metal:0
[2024-11-21 14:48:33] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-11-21 14:48:33] INFO jit.py:158: Using cached model lib: /Users/yuhaiming/.cache/mlc_llm/model_lib/844b459aad26bf51753183241229d8bb.dylib
[2024-11-21 14:48:33] INFO engine_base.py:180: The selected engine mode is local. We choose small max batch size and KV cache capacity to use less GPU memory.
[2024-11-21 14:48:33] INFO engine_base.py:205: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
[2024-11-21 14:48:33] INFO engine_base.py:210: If you have high concurrent requests and want to maximize the GPU memory utilization, please select mode "server".
thread '<unnamed>' panicked at src/lib.rs:26:50:
called `Result::unwrap()` on an `Err` value: Error("data did not match any variant of untagged enum ModelWrapper", line: 277157, column: 1)
stack backtrace:
   0:        0x104d41e78 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hade97c44b56fc870
   1:        0x104d8f6fc - core::fmt::write::h81cbefbffc581dab
   2:        0x104d51d64 - std::io::Write::write_fmt::h125c60058ebfe43c
   3:        0x104d41cb4 - std::sys_common::backtrace::print::hfa54be0dd0cf5860
   4:        0x104d61794 - std::panicking::default_hook::{{closure}}::h4235e0929057f079
   5:        0x104d61514 - std::panicking::default_hook::hcf67171e7c25be94
   6:        0x104d61c54 - std::panicking::rust_panic_with_hook::h1767d40d669aa9fe
   7:        0x104d42668 - std::panicking::begin_panic_handler::{{closure}}::h83ff281d56dc913c
   8:        0x104d42090 - std::sys_common::backtrace::__rust_end_short_backtrace::h2f399e8aa761a4f1
   9:        0x104d619e8 - _rust_begin_unwind
  10:        0x104dc460c - core::panicking::panic_fmt::hc32404f2b732859f
  11:        0x104dc44b4 - core::result::unwrap_failed::h2ea3b6e22f1f6a7c
  12:        0x104b0448c - _tokenizers_new_from_str
  13:        0x104af69d4 - __ZN10tokenizers9Tokenizer12FromBlobJSONERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE
  14:        0x104aea804 - __ZN3mlc3llm9Tokenizer8FromPathERKN3tvm7runtime6StringENSt3__18optionalINS0_13TokenizerInfoEEE
  15:        0x104af1b98 - __ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_16PackedFuncSubObjIZNS0_15TypedPackedFuncIFN3mlc3llm9TokenizerERKNS0_6StringEEE17AssignTypedLambdaINS6_3$_3EEEvT_NSt3__112basic_stringIcNSG_11char_traitsIcEENSG_9allocatorIcEEEEEUlRKNS0_7TVMArgsEPNS0_11TVMRetValueEE_EEE4CallEPKS1_SN_SR_
  16:        0x10e01e710 - _TVMFuncCall
  17:        0x101f37910 - __ZL39__pyx_f_3tvm_4_ffi_4_cy3_4core_FuncCallPvP7_objectP8TVMValuePi
  18:        0x101f370dc - __ZL76__pyx_pw_3tvm_4_ffi_4_cy3_4core_10ObjectBase_3__init_handle_by_constructor__P7_objectS0_S0_
  19:        0x1018ce11c - _PyCFunction_GetFlags
  20:        0x10188ac34 - __PyObject_MakeTpCall
  21:        0x10195edec - __PyEval_EvalFrameDefault
  22:        0x10195be48 - __PyEval_EvalFrameDefault
  23:        0x10188b474 - __PyFunction_Vectorcall
  24:        0x10188aa5c - __PyObject_FastCallDictTstate
  25:        0x10188b7c0 - __PyObject_Call_Prepend
  26:        0x1018efa9c - __PyType_Lookup
  27:        0x1018e70f8 - __PyType_Lookup
  28:        0x10188ac34 - __PyObject_MakeTpCall
  29:        0x10195edec - __PyEval_EvalFrameDefault
  30:        0x10195bec4 - __PyEval_EvalFrameDefault
  31:        0x10195fb40 - __PyEval_EvalFrameDefault
  32:        0x10188b3bc - __PyFunction_Vectorcall
  33:        0x10188d58c - _PyMethod_New
  34:        0x10195ed8c - __PyEval_EvalFrameDefault
  35:        0x10195bf40 - __PyEval_EvalFrameDefault
  36:        0x10195fb40 - __PyEval_EvalFrameDefault
  37:        0x10188b3bc - __PyFunction_Vectorcall
  38:        0x10188a9d8 - __PyObject_FastCallDictTstate
  39:        0x10188b7c0 - __PyObject_Call_Prepend
  40:        0x1018efa9c - __PyType_Lookup
  41:        0x1018e70f8 - __PyType_Lookup
  42:        0x10188ac34 - __PyObject_MakeTpCall
  43:        0x10195edec - __PyEval_EvalFrameDefault
  44:        0x10195bf40 - __PyEval_EvalFrameDefault
  45:        0x10195fb40 - __PyEval_EvalFrameDefault
  46:        0x101956504 - _PyEval_EvalCode
  47:        0x10199a7d8 - _PyParser_ASTFromStringObject
  48:        0x10199a9ac - _PyRun_FileExFlags
  49:        0x101998ad4 - _PyRun_SimpleFileExFlags
  50:        0x1019b5d08 - _Py_RunMain
  51:        0x1019b6178 - _Py_Main
  52:        0x1019b6218 - _Py_BytesMain
fatal runtime error: failed to initiate panic, error 5
zsh: abort      /usr/bin/python3 run-llava.py

To Reproduce

Steps to reproduce the behavior:

  1. Installing packages:

On MacOS:

pip install mlc_ai_cpu-0.17.1-cp39-cp39-macosx_13_0_arm64.whl
pip install mlc_llm_cpu-0.17.1-cp39-cp39-macosx_13_0_arm64.whl 

On Linux:

pip install mlc_ai_cu123-0.17.2-cp310-cp310-manylinux_2_28_x86_64.whl 
pip install mlc_llm_cu123-0.17.2-cp310-cp310-manylinux_2_28_x86_64.whl
  1. convert and compile model
mlc_llm convert_weight --model-type llava ../hub/llava-hf/llava-1.5-7b-hf --quantization q4f16_1 -o llava-1.5-7b-hf-mlc
mlc_llm gen_config ../hub/llava-hf/llava-1.5-7b-hf --quantization q4f16_1  --conv-template llava -o llava-1.5-7b-hf-mlc
mlc_llm compile llava-1.5-7b-hf-mlc/mlc-chat-config.json --device cuda -o llava-1.5-7b-hf-mlc/llava-1.5-7b-q4f16_1-cuda.so # or for macos
  1. run the model
from mlc_llm import MLCEngine
import PIL.Image
from io import BytesIO
import base64

model = "/path/to/llava-1.5-7b-hf-mlc" 
model_lib = "/path/to/llava-1.5-7b-hf-mlc/llava-1.5-7b-q4f16_1-cuda.so"
image_path = "/path/to/image.jpg"
engine = MLCEngine(model=model, model_lib=model_lib)

img = PIL.Image.open(image_path)
img_resized = img.resize((336, 336))

img_byte_arr = BytesIO()
img_resized.save(img_byte_arr, format="JPEG")
img_byte_arr = img_byte_arr.getvalue()

new_url = (
    f"data:image/jpeg;base64,{base64.b64encode(img_byte_arr).decode('utf-8')}"
)


for response in engine.chat.completions.create(
    messages=[{
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": new_url,
                    },
                    {
                        "type": "text",
                        "text": "<image>What is shown in this image?",
                    },
                ],
            }],
    model=model,
    stream=True,
):
    for choice in response.choices:
        print(choice.delta.content, end="", flush=True)

engine.terminate()

@HoiM HoiM added the bug Confirmed bugs label Nov 21, 2024
@MasterJH5574
Copy link
Member

HI @HoiM, could you try to install the latest nightly packages of mlc-llm and mlc-ai? We fixed the issue last week in 3474073, which is not yet included in a stable release. You can find the nightly package installation instructions at https://llm.mlc.ai/docs/install/mlc_llm.html.

@HoiM
Copy link
Author

HoiM commented Nov 22, 2024

On my MacOS, reinstalling the following two wheels solved the problem.

pip install mlc_ai_nightly_cpu-0.18.dev249-cp39-cp39-macosx_13_0_arm64.whl
pip install mlc_llm_nightly_cpu-0.18.dev71-cp39-cp39-macosx_13_0_arm64.whl

THX!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

2 participants