-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1][VLM] V1 support for selected single-image models. #11632
Conversation
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Roger Wang <[email protected]>
@@ -633,7 +633,7 @@ See [this page](#generative-models) for more information on how to use generativ | |||
- `llava-hf/llava-v1.6-mistral-7b-hf`, `llava-hf/llava-v1.6-vicuna-7b-hf`, etc. | |||
- | |||
- ✅︎ | |||
- | |||
- ✅︎ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Llava-next was already supported on V1 so this is just a doc update.
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
repeat_count=repeat_count[placeholder_token_idx], | ||
repeat_count=curr_repeat_count, | ||
pad_token_left=pad_token_left, | ||
pad_token_right=pad_token_right, | ||
) | ||
offset = len(new_token_ids) | ||
if pad_token_left is not None: | ||
offset += 1 | ||
placeholder_ranges.append({ | ||
"offset": len(new_token_ids), | ||
"length": len(replacement_ids) | ||
"offset": offset, | ||
"length": curr_repeat_count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was previously counting padding tokens as part of the placeholder tokens, which is not accurate.
vllm/model_executor/models/aria.py
Outdated
@MULTIMODAL_REGISTRY.register_image_input_mapper(input_mapper_for_aria) | ||
@INPUT_REGISTRY.register_input_processor(input_processor) | ||
@INPUT_REGISTRY.register_dummy_data(dummy_data_for_aria) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for dummy data generation was entirely missing and I'm not sure why, so I added in this PR since it's required for V1. cc @xffxff who originally added this model
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
vllm/model_executor/models/aria.py
Outdated
image_size2tokens = { | ||
int(math.sqrt(k) * hf_config.vision_config.patch_size): v | ||
for k, v in hf_config.projector_patch_to_query_dict.items() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that this is a fixed value, perhaps we can move it to AriaMoELMConfig
initialization in vllm/transformers_utils/configs/aria.py
to avoid repeat calculation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I can do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized we actually don't need this calculation int(math.sqrt(k) * hf_config.vision_config.patch_size
at all since we only care about the values here, so I will just simplify this.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified that the models work on both V0 and V1. Let's see if the tests pass.
Signed-off-by: Roger Wang <[email protected]>
…11632) Signed-off-by: Roger Wang <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: bjmsong <[email protected]>
This PR main adds V1 support for a number of single-image models since the code changes are sizable enough for a review.
To summarize, this PR:
aria
,blip2
,chameleon
,fuyu
.aria
(missing dummy data, incomplete input mapper, etc).llava-next
to run batched projection versus projections for individual images.max_num_seqs
andlimits_mm_per_prompt
.All models have been tested with
offline_inference_vision_language.py
on both V0 and V1.