[Bug]: Prefix caching doesn't work for LlavaOneVision #11371

sleepwalker2017 · 2024-12-20T10:28:19Z

Your current environment

The generated dummy input is video, but the preprocessor tries to get image from the dict, and then it crashes.

After I walk around this, the code still fails to run.

It complains this:

  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 264, in run_engine_core
    engine_core.run_busy_loop()
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 302, in run_busy_loop
    outputs = self.step()
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 125, in step
    output = self.model_executor.execute_model(scheduler_output)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/executor/uniproc_executor.py", line 72, in execute_model
    output = self.worker.execute_model(scheduler_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_worker.py", line 203, in execute_model                                                                                                              output = self.model_runner.execute_model(scheduler_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_model_runner.py", line 472, in execute_model
    encoder_outputs = self._gather_encoder_outputs(scheduler_output)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_model_runner.py", line 456, in _gather_encoder_outputs
    assert req_id in self.encoder_cache

Model Input Dumps

model=/data/models/llava-onevision-qwen2-7b-ov-hf/
VLLM_USE_V1=1 VLLM_ENABLE_V1_MULTIPROCESSING=1 python3 mmmu_bench.py --model $model --num-prompts 500  --image-hit-rate 0.3

The mmmu_bench.py comes from here:
#11187

🐛 Describe the bug

  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 264, in run_engine_core
    engine_core.run_busy_loop()
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 302, in run_busy_loop
    outputs = self.step()
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 125, in step
    output = self.model_executor.execute_model(scheduler_output)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/executor/uniproc_executor.py", line 72, in execute_model
    output = self.worker.execute_model(scheduler_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_worker.py", line 203, in execute_model                                                                                                              output = self.model_runner.execute_model(scheduler_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_model_runner.py", line 472, in execute_model
    encoder_outputs = self._gather_encoder_outputs(scheduler_output)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_model_runner.py", line 456, in _gather_encoder_outputs
    assert req_id in self.encoder_cache

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-12-20T11:25:29Z

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

sleepwalker2017 · 2024-12-20T11:32:52Z

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Thank you! Is there a link for that? I didn't find documents about the V1 Engine.

DarkLight1337 · 2024-12-20T11:36:28Z

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Thank you! Is there a link for that? I didn't find documents about the V1 Engine.

It is pinned in the list of issues .

DarkLight1337 · 2024-12-20T14:21:49Z

cc @ywang96 perhaps we should add a link to the V1 column header?

sleepwalker2017 · 2024-12-23T02:12:45Z

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Thank you! Is there a link for that? I didn't find documents about the V1 Engine.

It is pinned in the list of issues .

I see this one #8779, but I didn't find any examples about its usage.

It seems the V1 engine is not totally the same to use as the old one.

DarkLight1337 · 2024-12-23T04:26:14Z

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Thank you! Is there a link for that? I didn't find documents about the V1 Engine.

It is pinned in the list of issues .

I see this one #8779, but I didn't find any examples about its usage.

It seems the V1 engine is not totally the same to use as the old one.

It is still in development which is why we don't have user-facing docs about it yet. For now, you can enable it by setting environment variable VLLM_USE_V1=1.

ywang96 · 2024-12-23T08:10:46Z

@sleepwalker2017 V1 is only available for experimental use and not all multimodal models have yet been supported on V1. You can check our latest documentation here https://docs.vllm.ai/en/latest/models/supported_models.html#id3 (the V1 column) to see which models are supported.

sleepwalker2017 · 2024-12-23T08:33:59Z

https://docs.vllm.ai/en/latest/models/supported_models.html#id3

Thank you for the clear explanation!

sleepwalker2017 · 2024-12-25T04:14:10Z

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Hi, I have another question, what is needed if I want to add support for a multi-modal model in V1 engine? On the condition that it's already supported by the old engine.

ywang96 · 2024-12-26T08:51:49Z

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Hi, I have another question, what is needed if I want to add support for a multi-modal model in V1 engine? On the condition that it's already supported by the old engine.

@sleepwalker2017 There are a few key changes you'll need to do:

the model's input processor needs to return the PlaceholderRange that tracks where exactly the placeholder tokens for each image starts and its length.
The output of get_multimodal_embeddings needs to be
- Either a tuple of flattened image embeddings (2D tensor of shape [feature_size, hidden_size]), with each corresponding to an image, or
- A batched 3D tensor in case feature_size is constant across images.

Feel free to take a look at #10699 to see the changes needed. Also for now we only support image modality on V1.

sleepwalker2017 · 2024-12-26T11:05:22Z

changes

Thank you! I'll check that!

sleepwalker2017 added the bug Something isn't working label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Prefix caching doesn't work for LlavaOneVision #11371

[Bug]: Prefix caching doesn't work for LlavaOneVision #11371

sleepwalker2017 commented Dec 20, 2024

DarkLight1337 commented Dec 20, 2024 •

edited

Loading

sleepwalker2017 commented Dec 20, 2024

DarkLight1337 commented Dec 20, 2024

DarkLight1337 commented Dec 20, 2024

sleepwalker2017 commented Dec 23, 2024

DarkLight1337 commented Dec 23, 2024 •

edited

Loading

ywang96 commented Dec 23, 2024

sleepwalker2017 commented Dec 23, 2024

sleepwalker2017 commented Dec 25, 2024

ywang96 commented Dec 26, 2024 •

edited

Loading

sleepwalker2017 commented Dec 26, 2024

[Bug]: Prefix caching doesn't work for LlavaOneVision #11371

[Bug]: Prefix caching doesn't work for LlavaOneVision #11371

Comments

sleepwalker2017 commented Dec 20, 2024

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

DarkLight1337 commented Dec 20, 2024 • edited Loading

sleepwalker2017 commented Dec 20, 2024

DarkLight1337 commented Dec 20, 2024

DarkLight1337 commented Dec 20, 2024

sleepwalker2017 commented Dec 23, 2024

DarkLight1337 commented Dec 23, 2024 • edited Loading

ywang96 commented Dec 23, 2024

sleepwalker2017 commented Dec 23, 2024

sleepwalker2017 commented Dec 25, 2024

ywang96 commented Dec 26, 2024 • edited Loading

sleepwalker2017 commented Dec 26, 2024

DarkLight1337 commented Dec 20, 2024 •

edited

Loading

DarkLight1337 commented Dec 23, 2024 •

edited

Loading

ywang96 commented Dec 26, 2024 •

edited

Loading