[V1][VLM] V1 support for selected single-image models. #11632

ywang96 · 2024-12-30T14:20:19Z

This PR main adds V1 support for a number of single-image models since the code changes are sizable enough for a review.

To summarize, this PR:

Adds V1 support via merged multi-modal processor for aria, blip2, chameleon, fuyu.
Consolidates common code related to dummy data generation for merged multi-modal processor.
Fixes a few issues for aria (missing dummy data, incomplete input mapper, etc).
Adds a small optimization for llava-next to run batched projection versus projections for individual images.
Fixes some type errors in Pixtral model file.
Fixes V1 encoder profiling to correctly respect max_num_seqs and limits_mm_per_prompt.

All models have been tested with offline_inference_vision_language.py on both V0 and V1.

Signed-off-by: Roger Wang <[email protected]>

github-actions · 2024-12-30T14:20:33Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2024-12-30T14:21:56Z

docs/source/models/supported_models.md

@@ -633,7 +633,7 @@ See [this page](#generative-models) for more information on how to use generativ
  - `llava-hf/llava-v1.6-mistral-7b-hf`, `llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
  -
  - ✅︎
-  -
+  - ✅︎


Llava-next was already supported on V1 so this is just a doc update.

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2024-12-30T14:26:47Z

vllm/multimodal/utils.py

-                repeat_count=repeat_count[placeholder_token_idx],
+                repeat_count=curr_repeat_count,
                pad_token_left=pad_token_left,
                pad_token_right=pad_token_right,
            )
+            offset = len(new_token_ids)
+            if pad_token_left is not None:
+                offset += 1
            placeholder_ranges.append({
-                "offset": len(new_token_ids),
-                "length": len(replacement_ids)
+                "offset": offset,
+                "length": curr_repeat_count,


This was previously counting padding tokens as part of the placeholder tokens, which is not accurate.

ywang96 · 2024-12-30T14:30:18Z

vllm/model_executor/models/aria.py

 @MULTIMODAL_REGISTRY.register_image_input_mapper(input_mapper_for_aria)
-@INPUT_REGISTRY.register_input_processor(input_processor)
+@INPUT_REGISTRY.register_dummy_data(dummy_data_for_aria)


The code for dummy data generation was entirely missing and I'm not sure why, so I added in this PR since it's required for V1. cc @xffxff who originally added this model

Signed-off-by: Roger Wang <[email protected]>

Isotr0py · 2024-12-30T15:29:32Z

vllm/model_executor/models/aria.py

+    image_size2tokens = {
+        int(math.sqrt(k) * hf_config.vision_config.patch_size): v
+        for k, v in hf_config.projector_patch_to_query_dict.items()
+    }


Seems that this is a fixed value, perhaps we can move it to AriaMoELMConfig initialization in vllm/transformers_utils/configs/aria.py to avoid repeat calculation?

Yep I can do that

I realized we actually don't need this calculation int(math.sqrt(k) * hf_config.vision_config.patch_size at all since we only care about the values here, so I will just simplify this.

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337

I have verified that the models work on both V0 and V1. Let's see if the tests pass.

Signed-off-by: Roger Wang <[email protected]>

…11632) Signed-off-by: Roger Wang <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: bjmsong <[email protected]>

ywang96 and others added 9 commits December 29, 2024 13:23

batch

425d3c4

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'vllm-project:main' into v1-rearch-vlm

1ca9369

blip2

8edcc83

Signed-off-by: Roger Wang <[email protected]>

chameleon

5f76291

Signed-off-by: Roger Wang <[email protected]>

fix util

814f3bd

Signed-off-by: Roger Wang <[email protected]>

fuyu

efeb999

Signed-off-by: Roger Wang <[email protected]>

aria

5e568e8

Signed-off-by: Roger Wang <[email protected]>

fix profiling

135fd5c

Signed-off-by: Roger Wang <[email protected]>

update

0a8dbe0

Signed-off-by: Roger Wang <[email protected]>

ywang96 requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, comaniac and alexm-neuralmagic as code owners December 30, 2024 14:20

mergify bot added the documentation Improvements or additions to documentation label Dec 30, 2024

add llava-next

03f741d

Signed-off-by: Roger Wang <[email protected]>

ywang96 commented Dec 30, 2024

View reviewed changes

ywang96 added 2 commits December 30, 2024 14:23

revert testing code

8bce949

Signed-off-by: Roger Wang <[email protected]>

revert testing code

bbde414

Signed-off-by: Roger Wang <[email protected]>

ywang96 commented Dec 30, 2024

View reviewed changes

ywang96 added 3 commits December 30, 2024 14:46

tweak and clarify

ea928c6

Signed-off-by: Roger Wang <[email protected]>

clarify

55eada7

Signed-off-by: Roger Wang <[email protected]>

reword

bbd5752

Signed-off-by: Roger Wang <[email protected]>

Isotr0py reviewed Dec 30, 2024

View reviewed changes

This was referenced Dec 30, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

Use merged multi-modal processor for blip2 and chameleon

0452b99

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 self-requested a review as a code owner December 30, 2024 16:48

DarkLight1337 and others added 23 commits December 30, 2024 17:48

Be more clear

ba713ba

Signed-off-by: DarkLight1337 <[email protected]>

Merged multi-modal processor for Aria

b0efc4f

Signed-off-by: DarkLight1337 <[email protected]>

initialize fuyu merged processor

cdbd969

Signed-off-by: Isotr0py <[email protected]>

Clean up

48c6946

Signed-off-by: DarkLight1337 <[email protected]>

Clean up

ea76759

Signed-off-by: DarkLight1337 <[email protected]>

Try remove mark

bc976a7

Signed-off-by: DarkLight1337 <[email protected]>

Consolidate dummy data code

f79f79a

Signed-off-by: DarkLight1337 <[email protected]>

fix fuyu variant images test

45ec10c

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into v1-rearch-vlm

0926717

Fix some type errors in Pixtral-HF

0fe561d

Signed-off-by: DarkLight1337 <[email protected]>

fix missing flatten_bn in fuyu

3512ed6

Signed-off-by: Isotr0py <[email protected]>

Update docs

5e0f66c

Signed-off-by: DarkLight1337 <[email protected]>

Update docs

1c243ab

Signed-off-by: DarkLight1337 <[email protected]>

Get fuyu processor tests to pass

09d64f4

Signed-off-by: DarkLight1337 <[email protected]>

Oops

6d6d71c

Signed-off-by: DarkLight1337 <[email protected]>

Fix unable to run model

ea93a2c

Signed-off-by: DarkLight1337 <[email protected]>

Avoid warning from HF

9aeb7b2

Signed-off-by: DarkLight1337 <[email protected]>

fix too large image for fuyu

768c1d9

Signed-off-by: Isotr0py <[email protected]>

fix prompt token ids

0c82c51

Signed-off-by: Isotr0py <[email protected]>

Fix missing batch dimension in vision embeddings

d0d1fdc

Signed-off-by: DarkLight1337 <[email protected]>

fix variant patches batching

afcf7b1

Signed-off-by: Isotr0py <[email protected]>

Simplify the code

cb9522d

Signed-off-by: DarkLight1337 <[email protected]>

format

df832df

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 approved these changes Dec 31, 2024

View reviewed changes

Isotr0py enabled auto-merge (squash) December 31, 2024 17:21

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 31, 2024

ywang96 and others added 2 commits December 31, 2024 09:37

Merge branch 'vllm-project:main' into v1-rearch-vlm

868e8e9

simplify

cc9c5f1

Signed-off-by: Roger Wang <[email protected]>

Isotr0py merged commit e7c7c5e into vllm-project:main Dec 31, 2024
56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][VLM] V1 support for selected single-image models. #11632

[V1][VLM] V1 support for selected single-image models. #11632

ywang96 commented Dec 30, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 30, 2024

ywang96 Dec 30, 2024

ywang96 Dec 30, 2024

ywang96 Dec 30, 2024 •

edited

Loading

Isotr0py Dec 30, 2024

ywang96 Dec 31, 2024

ywang96 Dec 31, 2024

DarkLight1337 left a comment

[V1][VLM] V1 support for selected single-image models. #11632

[V1][VLM] V1 support for selected single-image models. #11632

Conversation

ywang96 commented Dec 30, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 30, 2024

ywang96 Dec 30, 2024

Choose a reason for hiding this comment

ywang96 Dec 30, 2024

Choose a reason for hiding this comment

ywang96 Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

Isotr0py Dec 30, 2024

Choose a reason for hiding this comment

ywang96 Dec 31, 2024

Choose a reason for hiding this comment

ywang96 Dec 31, 2024

Choose a reason for hiding this comment

DarkLight1337 left a comment

Choose a reason for hiding this comment

ywang96 commented Dec 30, 2024 •

edited by github-actions bot

Loading

ywang96 Dec 30, 2024 •

edited

Loading