Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model]: Add support for Aria model #10514

Merged
merged 14 commits into from
Nov 25, 2024
Merged

Conversation

xffxff
Copy link
Contributor

@xffxff xffxff commented Nov 21, 2024

Add support for rhymes-ai/Aria, a multimodal MoE model.

Feel free to request changes!

You can try it with the following code:

from PIL import Image
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
import requests


model_id = "rhymes-ai/Aria"


def main():
    llm = LLM(
        model=model_id,
        tokenizer=model_id,
        tokenizer_mode="slow",
        dtype="bfloat16",
        # limit_mm_per_prompt={"image": 256},
        enforce_eager=True,
        trust_remote_code=True,
    )

    tokenizer = AutoTokenizer.from_pretrained(
        model_id, trust_remote_code=True, use_fast=False
    )


    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image"},
                {
                    "type": "text",
                    "text": "What is the image?",
                },
            ],
        }
    ]

    message = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

    image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"

    image = Image.open(requests.get(image_path, stream=True).raw)


    outputs = llm.generate(
        {
            "prompt_token_ids": message,
            "multi_modal_data": {
                "image": [
                    image
                ],
                "max_image_size": 980, 
            },
        },
        sampling_params=SamplingParams(max_tokens=200, top_k=1, stop=["<|im_end|>"]),
    )

    for o in outputs:
        generated_tokens = o.outputs[0].token_ids
        print(tokenizer.decode(generated_tokens))


if __name__ == "__main__":
    main()

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the documentation Improvements or additions to documentation label Nov 21, 2024
Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
@xffxff xffxff force-pushed the support_aria branch 2 times, most recently from 56090fb to a0ebc3a Compare November 22, 2024 04:06
@xffxff xffxff requested a review from Isotr0py November 22, 2024 05:58
Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the example code to offline_inference_vision_language.py? If Aria supports multi-images inputs, we need to add a multi-images example to offline_inference_vision_language_multi_image.py as well.

Otherwise overall looks good. Just need a minor modification.

vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved
@ywang96
Copy link
Member

ywang96 commented Nov 22, 2024

Thanks for this PR @xffxff! Just so you're aware - I'm doing some refactoring work in #10570 so it would be great if you can follow the new interface from this PR already. Not a big deal if it's too much work.

Signed-off-by: xffxff <[email protected]>
@xffxff
Copy link
Contributor Author

xffxff commented Nov 22, 2024

Thanks for this PR @xffxff! Just so you're aware - I'm doing some refactoring work in #10570 so it would be great if you can follow the new interface from this PR already. Not a big deal if it's too much work.

Yes @ywang96 I’d be happy to work on the refactoring. I noticed a warning about using legacy input pipeline and had tried switching to the new multi-modal processor, but I couldn’t find any examples at the time and decided to hold off. Now that I can refer to your work in #10570, I’ll update my PR to align with the new interface.

@xffxff
Copy link
Contributor Author

xffxff commented Nov 22, 2024

Can you add the example code to offline_inference_vision_language.py? If Aria supports multi-images inputs, we need to add a multi-images example to offline_inference_vision_language_multi_image.py as well.

Sure! I'll add some examples

@mergify mergify bot added the frontend label Nov 25, 2024
@xffxff
Copy link
Contributor Author

xffxff commented Nov 25, 2024

Can you add the example code to offline_inference_vision_language.py? If Aria supports multi-images inputs, we need to add a multi-images example to offline_inference_vision_language_multi_image.py as well.

Done! Please take a look @Isotr0py

@xffxff
Copy link
Contributor Author

xffxff commented Nov 25, 2024

Thanks for this PR @xffxff! Just so you're aware - I'm doing some refactoring work in #10570 so it would be great if you can follow the new interface from this PR already. Not a big deal if it's too much work.

I’ve updated my PR to follow the new interface—just needed to make a few changes.

Yes @ywang96 I’d be happy to work on the refactoring. I noticed a warning about using legacy input pipeline and had tried switching to the new multi-modal processor, but I couldn’t find any examples at the time and decided to hold off. Now that I can refer to your work in #10570, I’ll update my PR to align with the new interface.

@ywang96 Apologies for the confusion earlier; I initially thought your refactoring was related to #10114 and didn’t look closely at your PR.

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now! Thanks for supporting this!

@xffxff
Copy link
Contributor Author

xffxff commented Nov 25, 2024

LGTM now! Thanks for supporting this!

Thank you so much for your patience! @Isotr0py!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 25, 2024 11:06
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 25, 2024
@DarkLight1337
Copy link
Member

Please fix the error in models tests.

Signed-off-by: Isotr0py <[email protected]>
@DarkLight1337 DarkLight1337 merged commit b1d9205 into vllm-project:main Nov 25, 2024
52 checks passed
@DarkLight1337
Copy link
Member

Oh... just realized that we don't have models tests for this yet. @Isotr0py do you have time to add one?

@Isotr0py
Copy link
Collaborator

Isotr0py commented Nov 27, 2024

Hmmm, this model is a large model might need 80G A100 for running. Not sure if I can get such an idle card for testing these days. 😅

Update: I just found a FP8-dynamic quantization for aria: thwin27/Aria-sequential_mlp-FP8-dynamic. Perhaps I can test with this model.

@xffxff
Copy link
Contributor Author

xffxff commented Nov 27, 2024

Hmmm, this model is a large model might need 80G A100 for running. Not sure if I can get an idle card for testing these days. 😅

@Isotr0py I’m happy to help with adding tests! It would be great if there are some examples I could refer to.

@Isotr0py
Copy link
Collaborator

@xffxff Thank you very much!

You can just add a test setting for this model in tests/models/decoder_only/vision_language/test_models.py. Then run the added test with a command like:

pytest -s -v tests/models/decoder_only/vision_language/test_models.py -k aria

You can refer to other models's settings there:

"qwen2_vl": VLMTestInfo(
models=["Qwen/Qwen2-VL-2B-Instruct"],
test_type=(
VLMTestType.IMAGE,
VLMTestType.MULTI_IMAGE,
VLMTestType.VIDEO
),
prompt_formatter=lambda img_prompt: f"<|im_start|>User\n{img_prompt}<|im_end|>\n<|im_start|>assistant\n", # noqa: E501
img_idx_to_prompt=lambda idx: "<|vision_start|><|image_pad|><|vision_end|>", # noqa: E501
video_idx_to_prompt=lambda idx: "<|vision_start|><|video_pad|><|vision_end|>", # noqa: E501
max_model_len=4096,
max_num_seqs=2,
auto_cls=AutoModelForVision2Seq,
vllm_output_post_proc=model_utils.qwen2_vllm_to_hf_output,
image_size_factors=[(), (0.25,), (0.25, 0.25, 0.25), (0.25, 0.2, 0.15)],
marks=[pytest.mark.core_model, pytest.mark.cpu_model],
),

afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Dec 2, 2024
Signed-off-by: xffxff <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants