You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the mlc-llm Swift package to chat with vision language models, specifically the Phi-3-vision-instruct model, errors occur when attempting to input an image for the second time or input multiple images simutaneously. The first image input processes correctly, but subsequent attempts result in one of the following errors:
NDArray size mismatch:
libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: [23:07:43] /Users/neet/code/mlc-llm/3rdparty/tvm/src/runtime/ndarray.cc:213: Check failed: relative_byte_offset + view_size <= curr_size (11046600 vs. 1017846) : ValueError: View with shape [1, 1700, 2166, 3] and datatype uint8 would have a size of 11046600 bytes. This would occupy bytes 0 <= i_byte < 11046600 within the backing array. However, the NDArray being viewed only contains 1017846 bytes (shape = [1, 618, 549, 3], dtype= uint8).
Embedding shape mismatch:
libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: [23:22:38] /Users/neet/code/mlc-llm/cpp/serve/model.cc:1023: InternalError: Check failed: embedding->shape[0] + offset <= dst->shape[0] (2535 vs. 2048) :
To Reproduce
Steps to reproduce the behavior:
Send an image from the iOS app through the Swift mlc-llm SDK MLCEngine.chatCompletion() as base64 encoded url. This image shall input into the pipeline and decode tokens correctly.
Send a second image, this time it will throw one of the two errors above.
or,
Send multiple images from the iOS app through the Swift mlc-llm SDK MLCEngine.chatCompletion() as base64 encoded url.
Expected behavior
The model should process multiple image inputs without throwing exceptions.
Environment
Platform: iOS
Operating system: macOS
Device: macOS
How you installed MLC-LLM: conda & pip
How you installed TVM-Unity: Source
The text was updated successfully, but these errors were encountered:
Neet-Nestor
changed the title
[Bug] Sending images with large dimensions into vision models will break TVM
[Bug] Image input to vision models will throw error from TVM
Nov 22, 2024
Neet-Nestor
changed the title
[Bug] Image input to vision models will throw error from TVM
[Bug][iOS/Swift SDK] Image input to vision models will throw error from TVM
Nov 22, 2024
Neet-Nestor
changed the title
[Bug][iOS/Swift SDK] Image input to vision models will throw error from TVM
[Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM
Nov 22, 2024
🐛 Bug
When using the
mlc-llm
Swift package to chat with vision language models, specifically thePhi-3-vision-instruct
model, errors occur when attempting to input an image for the second time or input multiple images simutaneously. The first image input processes correctly, but subsequent attempts result in one of the following errors:To Reproduce
Steps to reproduce the behavior:
MLCEngine.chatCompletion()
as base64 encoded url. This image shall input into the pipeline and decode tokens correctly.or,
MLCEngine.chatCompletion()
as base64 encoded url.Expected behavior
The model should process multiple image inputs without throwing exceptions.
Environment
The text was updated successfully, but these errors were encountered: