Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Prefix caching doesn't work for LlavaOneVision #11371

Open
1 task done
sleepwalker2017 opened this issue Dec 20, 2024 · 11 comments
Open
1 task done

[Bug]: Prefix caching doesn't work for LlavaOneVision #11371

sleepwalker2017 opened this issue Dec 20, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@sleepwalker2017
Copy link

Your current environment

The generated dummy input is video, but the preprocessor tries to get image from the dict, and then it crashes.

After I walk around this, the code still fails to run.

It complains this:

  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 264, in run_engine_core
    engine_core.run_busy_loop()
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 302, in run_busy_loop
    outputs = self.step()
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 125, in step
    output = self.model_executor.execute_model(scheduler_output)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/executor/uniproc_executor.py", line 72, in execute_model
    output = self.worker.execute_model(scheduler_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_worker.py", line 203, in execute_model                                                                                                              output = self.model_runner.execute_model(scheduler_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_model_runner.py", line 472, in execute_model
    encoder_outputs = self._gather_encoder_outputs(scheduler_output)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_model_runner.py", line 456, in _gather_encoder_outputs
    assert req_id in self.encoder_cache

Model Input Dumps

model=/data/models/llava-onevision-qwen2-7b-ov-hf/
VLLM_USE_V1=1 VLLM_ENABLE_V1_MULTIPROCESSING=1 python3 mmmu_bench.py --model $model --num-prompts 500  --image-hit-rate 0.3

The mmmu_bench.py comes from here:
#11187

🐛 Describe the bug

  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 264, in run_engine_core
    engine_core.run_busy_loop()
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 302, in run_busy_loop
    outputs = self.step()
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/engine/core.py", line 125, in step
    output = self.model_executor.execute_model(scheduler_output)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/executor/uniproc_executor.py", line 72, in execute_model
    output = self.worker.execute_model(scheduler_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_worker.py", line 203, in execute_model                                                                                                              output = self.model_runner.execute_model(scheduler_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_model_runner.py", line 472, in execute_model
    encoder_outputs = self._gather_encoder_outputs(scheduler_output)
  File "/data/test/prefix-caching-vlm/vllm/vllm/v1/worker/gpu_model_runner.py", line 456, in _gather_encoder_outputs
    assert req_id in self.encoder_cache

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@sleepwalker2017 sleepwalker2017 added the bug Something isn't working label Dec 20, 2024
@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 20, 2024

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

@sleepwalker2017
Copy link
Author

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Thank you! Is there a link for that? I didn't find documents about the V1 Engine.

@DarkLight1337
Copy link
Member

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Thank you! Is there a link for that? I didn't find documents about the V1 Engine.

It is pinned in the list of issues .

@DarkLight1337
Copy link
Member

cc @ywang96 perhaps we should add a link to the V1 column header?

@sleepwalker2017
Copy link
Author

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Thank you! Is there a link for that? I didn't find documents about the V1 Engine.

It is pinned in the list of issues .

I see this one #8779, but I didn't find any examples about its usage.

It seems the V1 engine is not totally the same to use as the old one.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 23, 2024

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Thank you! Is there a link for that? I didn't find documents about the V1 Engine.

It is pinned in the list of issues .

I see this one #8779, but I didn't find any examples about its usage.

It seems the V1 engine is not totally the same to use as the old one.

It is still in development which is why we don't have user-facing docs about it yet. For now, you can enable it by setting environment variable VLLM_USE_V1=1.

@ywang96
Copy link
Member

ywang96 commented Dec 23, 2024

@sleepwalker2017 V1 is only available for experimental use and not all multimodal models have yet been supported on V1. You can check our latest documentation here https://docs.vllm.ai/en/latest/models/supported_models.html#id3 (the V1 column) to see which models are supported.

@sleepwalker2017
Copy link
Author

https://docs.vllm.ai/en/latest/models/supported_models.html#id3

Thank you for the clear explanation!

@sleepwalker2017
Copy link
Author

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Hi, I have another question, what is needed if I want to add support for a multi-modal model in V1 engine? On the condition that it's already supported by the old engine.

@ywang96
Copy link
Member

ywang96 commented Dec 26, 2024

Most multi-modal models don't support V1 yet. You can check the Supported Models page for more details (there is a V1 column for multi-modal models).

Hi, I have another question, what is needed if I want to add support for a multi-modal model in V1 engine? On the condition that it's already supported by the old engine.

@sleepwalker2017 There are a few key changes you'll need to do:

  1. the model's input processor needs to return the PlaceholderRange that tracks where exactly the placeholder tokens for each image starts and its length.
  2. The output of get_multimodal_embeddings needs to be
    • Either a tuple of flattened image embeddings (2D tensor of shape [feature_size, hidden_size]), with each corresponding to an image, or
    • A batched 3D tensor in case feature_size is constant across images.

Feel free to take a look at #10699 to see the changes needed. Also for now we only support image modality on V1.

@sleepwalker2017
Copy link
Author

changes

Thank you! I'll check that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants