[V1] Initial support of multimodal models for V1 re-arch #10699

ywang96 · 2024-11-27T08:05:37Z

This PR adds V1 support for selected image language models

Intern-VL
Molmo
Pixtral
~~Qwen2-VL (missing MRope implementation, will address in a separate PR)~~ separated to [V1][VLM] Add V1-rearch image inference support for Qwen2-VL #10988

Signed-off-by: Roger Wang <[email protected]>

github-actions · 2024-11-27T08:05:52Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Roger Wang <[email protected]>

vllm/model_executor/models/utils.py

vllm/model_executor/models/pixtral.py

Signed-off-by: Roger Wang <[email protected]>

vllm/model_executor/models/molmo.py

DarkLight1337

Model changes look good to me overall. We can consider passing the placeholder indices to get_input_embeddings so that the relationship between input_ids and multimodal_embeddings becomes more explicit.

We should add tests to ensure that these models actually work in V1.

ywang96 · 2024-12-02T17:21:37Z

We can consider passing the placeholder indices to get_input_embeddings so that the relationship between input_ids and multimodal_embeddings becomes more explicit.

Hmm I don't think that alone will help actually because V1 assumes chunked prefill by default. Unless we're passing additional information such as computed tokens per request, the model runner doesn't know the original indices of input_ids in their corresponding requests.

Signed-off-by: Roger Wang <[email protected]>

vllm/v1/engine/llm_engine.py

vllm/model_executor/models/interfaces.py

vllm/multimodal/inputs.py

vllm/model_executor/models/utils.py

Signed-off-by: Roger Wang <[email protected]>

vllm/model_executor/models/utils.py

Signed-off-by: Roger Wang <[email protected]>

WoosukKwon

LGTM! It'd be nice if @DarkLight1337 can also take another look.

mergify · 2024-12-07T18:30:27Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ywang96.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2024-12-08T10:48:59Z

It seems that we have tests related to using image embeddings as input, indicating usage of this feature from the community (which will be a breaking change in V1 since we cannot take batched image embedding directly as input), thus I'm separating Qwen2VL out to unblock this PR.

Signed-off-by: Roger Wang <[email protected]>

DarkLight1337 · 2024-12-08T13:20:39Z

LGTM! It'd be nice if @DarkLight1337 can also take another look.

Sorry for the late reply, it looks good to me.

…t#10699) Signed-off-by: Roger Wang <[email protected]>

internvl

246d75b

Signed-off-by: Roger Wang <[email protected]>

ywang96 mentioned this pull request Nov 27, 2024

[V1][VLM] Enable proper chunked prefill for multimodal models #9950

Closed

fix token id

2a081bb

Signed-off-by: Roger Wang <[email protected]>

ywang96 mentioned this pull request Nov 27, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

71 tasks

DarkLight1337 self-assigned this Nov 28, 2024

ywang96 and others added 2 commits November 28, 2024 00:50

Merge branch 'vllm-project:main' into v1-initial

e4d6bb2

Pixtral

94d66cc

Signed-off-by: Roger Wang <[email protected]>

DarkLight1337 reviewed Nov 30, 2024

View reviewed changes

vllm/model_executor/models/utils.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Nov 30, 2024

View reviewed changes

vllm/model_executor/models/pixtral.py Outdated Show resolved Hide resolved

ywang96 and others added 10 commits November 30, 2024 11:20

use special ids

79f24c6

Signed-off-by: Roger Wang <[email protected]>

comment

7a88433

Signed-off-by: Roger Wang <[email protected]>

cleanup for pixtral

af1dbab

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'vllm-project:main' into v1-initial

39dd4f2

qwen2vl

6d0df5a

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'vllm-project:main' into v1-initial

124b0c1

molmo

8c4da46

Signed-off-by: Roger Wang <[email protected]>

minor changes on interfaces

3e3a346

Signed-off-by: Roger Wang <[email protected]>

typo

1c50613

Signed-off-by: Roger Wang <[email protected]>

pad

6d8ddff

Signed-off-by: Roger Wang <[email protected]>

ywang96 commented Dec 2, 2024

View reviewed changes

vllm/model_executor/models/molmo.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Dec 2, 2024

View reviewed changes

Merge branch 'vllm-project:main' into v1-initial

7ddf7d9

ywang96 marked this pull request as ready for review December 2, 2024 18:39

ywang96 requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, comaniac and alexm-neuralmagic as code owners December 2, 2024 18:39

ywang96 and others added 3 commits December 5, 2024 23:21

Merge branch 'vllm-project:main' into v1-initial

e32efd5

molmo

0176b7b

Signed-off-by: Roger Wang <[email protected]>

fix launch args

69f4e5f

Signed-off-by: Roger Wang <[email protected]>

WoosukKwon reviewed Dec 6, 2024

View reviewed changes

vllm/v1/engine/llm_engine.py Show resolved Hide resolved

vllm/model_executor/models/interfaces.py Show resolved Hide resolved

vllm/multimodal/inputs.py Outdated Show resolved Hide resolved

vllm/model_executor/models/utils.py Show resolved Hide resolved

ywang96 added 3 commits December 6, 2024 18:06

fix qwen2-vl

8b7e746

Signed-off-by: Roger Wang <[email protected]>

typing

bb15b01

Signed-off-by: Roger Wang <[email protected]>

add documentation

610e662

Signed-off-by: Roger Wang <[email protected]>

WoosukKwon reviewed Dec 6, 2024

View reviewed changes

vllm/model_executor/models/utils.py Show resolved Hide resolved

ywang96 added 2 commits December 6, 2024 21:31

minor fix

2b5fdd7

Signed-off-by: Roger Wang <[email protected]>

typehint

a5a38dd

Signed-off-by: Roger Wang <[email protected]>

WoosukKwon approved these changes Dec 7, 2024

View reviewed changes

mergify bot added the needs-rebase label Dec 7, 2024

Merge branch 'main' into v1-initial

fbf9cd0

Signed-off-by: Roger Wang <[email protected]>

mergify bot removed the needs-rebase label Dec 7, 2024

ywang96 added ready ONLY add when PR is ready to merge/full CI is needed needs-rebase and removed needs-rebase labels Dec 7, 2024

ywang96 enabled auto-merge (squash) December 7, 2024 22:47

iterate

8d1d80e

Signed-off-by: Roger Wang <[email protected]>

ywang96 disabled auto-merge December 8, 2024 10:48

revert changes in qwen2vl

4a79255

Signed-off-by: Roger Wang <[email protected]>

ywang96 mentioned this pull request Dec 8, 2024

[V1][VLM] Add V1-rearch image inference support for Qwen2-VL #10988

Closed

4 tasks

ywang96 enabled auto-merge (squash) December 8, 2024 12:28

ywang96 merged commit a11f326 into vllm-project:main Dec 8, 2024
54 checks passed

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[V1] Initial support of multimodal models for V1 re-arch (vllm-projec…

83497a6

…t#10699) Signed-off-by: Roger Wang <[email protected]>

ywang96 mentioned this pull request Dec 26, 2024

[Bug]: Prefix caching doesn't work for LlavaOneVision #11371

Open

1 task

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[V1] Initial support of multimodal models for V1 re-arch (vllm-projec…

c812fd4

…t#10699) Signed-off-by: Roger Wang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Initial support of multimodal models for V1 re-arch #10699

[V1] Initial support of multimodal models for V1 re-arch #10699

ywang96 commented Nov 27, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 27, 2024

DarkLight1337 left a comment

ywang96 commented Dec 2, 2024

WoosukKwon left a comment

mergify bot commented Dec 7, 2024

ywang96 commented Dec 8, 2024 •

edited

Loading

DarkLight1337 commented Dec 8, 2024

[V1] Initial support of multimodal models for V1 re-arch #10699

[V1] Initial support of multimodal models for V1 re-arch #10699

Conversation

ywang96 commented Nov 27, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 27, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

ywang96 commented Dec 2, 2024

WoosukKwon left a comment

Choose a reason for hiding this comment

mergify bot commented Dec 7, 2024

ywang96 commented Dec 8, 2024 • edited Loading

DarkLight1337 commented Dec 8, 2024

ywang96 commented Nov 27, 2024 •

edited by github-actions bot

Loading

ywang96 commented Dec 8, 2024 •

edited

Loading