[VLM] Abstract out multi-modal data parsing in merged processor #11620

DarkLight1337 · 2024-12-30T08:38:43Z

This PR abstracts out the logic of parsing input multi-modal data and converting them into HF processor inputs.

This corresponds to the plugin-based abstraction we have in the original code. Once we migrate all models to the merged multi-modal processor, we will deprecate multi-modal plugins and instead ask developers to modify the data parser to support additional modalities for a model (by overriding BaseMultiModalProcessor._get_data_parser).

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2024-12-30T08:38:53Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

vllm/model_executor/models/qwen2_vl.py

vllm/model_executor/models/ultravox.py

Signed-off-by: DarkLight1337 <[email protected]>

Isotr0py

LGTM!

…-project#11620) Signed-off-by: DarkLight1337 <[email protected]>

…-project#11620) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: xcnick <[email protected]>

Abstract out parsing of multi-modal data

b110c58

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested a review from Isotr0py December 30, 2024 08:38

This was referenced Dec 30, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

DarkLight1337 added 3 commits December 30, 2024 08:47

Clean up

ed88a2d

Signed-off-by: DarkLight1337 <[email protected]>

Make internal

e7aa0f1

Signed-off-by: DarkLight1337 <[email protected]>

Rename

642dcc2

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 changed the title ~~[VLM] Abstract out parsing of multi-modal data~~ [VLM] Abstract out multi-modal data parsing for processor Dec 30, 2024

DarkLight1337 changed the title ~~[VLM] Abstract out multi-modal data parsing for processor~~ [VLM] Abstract out multi-modal data parsing in merged processor Dec 30, 2024

DarkLight1337 requested a review from ywang96 December 30, 2024 09:38

Isotr0py reviewed Dec 30, 2024

View reviewed changes

vllm/model_executor/models/qwen2_vl.py Outdated Show resolved Hide resolved

vllm/model_executor/models/ultravox.py Outdated Show resolved Hide resolved

DarkLight1337 added 2 commits December 30, 2024 11:07

Update timings

3e59061

Signed-off-by: DarkLight1337 <[email protected]>

Initialize data parser in a separate method

064506e

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot added the ci/build label Dec 30, 2024

DarkLight1337 added 3 commits December 30, 2024 11:14

Simplify

5384037

Signed-off-by: DarkLight1337 <[email protected]>

Iterate

7752760

Signed-off-by: DarkLight1337 <[email protected]>

Rename Inputs to Items to avoid confusion

0197724

Signed-off-by: DarkLight1337 <[email protected]>

Isotr0py approved these changes Dec 30, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) December 30, 2024 13:32

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 30, 2024

DarkLight1337 merged commit 8d9b672 into vllm-project:main Dec 30, 2024
88 checks passed

DarkLight1337 deleted the mm-parse branch December 30, 2024 15:07

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[VLM] Abstract out multi-modal data parsing in merged processor (vllm…

ce4576c

…-project#11620) Signed-off-by: DarkLight1337 <[email protected]>

xcnick pushed a commit to xcnick/vllm that referenced this pull request Dec 31, 2024

[VLM] Abstract out multi-modal data parsing in merged processor (vllm…

0ca60bd

…-project#11620) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: xcnick <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Abstract out multi-modal data parsing in merged processor #11620

[VLM] Abstract out multi-modal data parsing in merged processor #11620

DarkLight1337 commented Dec 30, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 30, 2024

Isotr0py left a comment

[VLM] Abstract out multi-modal data parsing in merged processor #11620

[VLM] Abstract out multi-modal data parsing in merged processor #11620

Conversation

DarkLight1337 commented Dec 30, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 30, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Dec 30, 2024 •

edited by github-actions bot

Loading