[Model]: Add support for Aria model #10514

xffxff · 2024-11-21T03:45:48Z

Add support for rhymes-ai/Aria, a multimodal MoE model.

Feel free to request changes!

You can try it with the following code:

from PIL import Image
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
import requests


model_id = "rhymes-ai/Aria"


def main():
    llm = LLM(
        model=model_id,
        tokenizer=model_id,
        tokenizer_mode="slow",
        dtype="bfloat16",
        # limit_mm_per_prompt={"image": 256},
        enforce_eager=True,
        trust_remote_code=True,
    )

    tokenizer = AutoTokenizer.from_pretrained(
        model_id, trust_remote_code=True, use_fast=False
    )


    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image"},
                {
                    "type": "text",
                    "text": "What is the image?",
                },
            ],
        }
    ]

    message = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

    image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"

    image = Image.open(requests.get(image_path, stream=True).raw)


    outputs = llm.generate(
        {
            "prompt_token_ids": message,
            "multi_modal_data": {
                "image": [
                    image
                ],
                "max_image_size": 980, 
            },
        },
        sampling_params=SamplingParams(max_tokens=200, top_k=1, stop=["<|im_end|>"]),
    )

    for o in outputs:
        generated_tokens = o.outputs[0].token_ids
        print(tokenizer.decode(generated_tokens))


if __name__ == "__main__":
    main()

github-actions · 2024-11-21T03:46:00Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Isotr0py

Some initial comments.

vllm/model_executor/models/aria.py

Signed-off-by: xffxff <[email protected]>

Isotr0py

Can you add the example code to offline_inference_vision_language.py? If Aria supports multi-images inputs, we need to add a multi-images example to offline_inference_vision_language_multi_image.py as well.

Otherwise overall looks good. Just need a minor modification.

vllm/model_executor/models/aria.py

Co-authored-by: Isotr0py <[email protected]>

Signed-off-by: xffxff <[email protected]>

ywang96 · 2024-11-22T07:59:09Z

Thanks for this PR @xffxff! Just so you're aware - I'm doing some refactoring work in #10570 so it would be great if you can follow the new interface from this PR already. Not a big deal if it's too much work.

Signed-off-by: xffxff <[email protected]>

xffxff · 2024-11-22T08:12:07Z

Thanks for this PR @xffxff! Just so you're aware - I'm doing some refactoring work in #10570 so it would be great if you can follow the new interface from this PR already. Not a big deal if it's too much work.

Yes @ywang96 I’d be happy to work on the refactoring. I noticed a warning about using legacy input pipeline and had tried switching to the new multi-modal processor, but I couldn’t find any examples at the time and decided to hold off. Now that I can refer to your work in #10570, I’ll update my PR to align with the new interface.

Signed-off-by: xffxff <[email protected]>

xffxff · 2024-11-22T10:02:56Z

Can you add the example code to offline_inference_vision_language.py? If Aria supports multi-images inputs, we need to add a multi-images example to offline_inference_vision_language_multi_image.py as well.

Sure! I'll add some examples

Signed-off-by: xffxff <[email protected]>

xffxff · 2024-11-25T05:04:25Z

Can you add the example code to offline_inference_vision_language.py? If Aria supports multi-images inputs, we need to add a multi-images example to offline_inference_vision_language_multi_image.py as well.

Done! Please take a look @Isotr0py

xffxff · 2024-11-25T05:09:02Z

Thanks for this PR @xffxff! Just so you're aware - I'm doing some refactoring work in #10570 so it would be great if you can follow the new interface from this PR already. Not a big deal if it's too much work.

I’ve updated my PR to follow the new interface—just needed to make a few changes.

Yes @ywang96 I’d be happy to work on the refactoring. I noticed a warning about using legacy input pipeline and had tried switching to the new multi-modal processor, but I couldn’t find any examples at the time and decided to hold off. Now that I can refer to your work in #10570, I’ll update my PR to align with the new interface.

@ywang96 Apologies for the confusion earlier; I initially thought your refactoring was related to #10114 and didn’t look closely at your PR.

Isotr0py

LGTM now! Thanks for supporting this!

xffxff · 2024-11-25T08:33:55Z

LGTM now! Thanks for supporting this!

Thank you so much for your patience! @Isotr0py!

DarkLight1337 · 2024-11-25T12:20:33Z

Please fix the error in models tests.

Signed-off-by: Isotr0py <[email protected]>

tests/models/registry.py

DarkLight1337 · 2024-11-27T11:53:01Z

Oh... just realized that we don't have models tests for this yet. @Isotr0py do you have time to add one?

Isotr0py · 2024-11-27T12:11:16Z

Hmmm, this model is a large model might need 80G A100 for running. Not sure if I can get such an idle card for testing these days. 😅

Update: I just found a FP8-dynamic quantization for aria: thwin27/Aria-sequential_mlp-FP8-dynamic. Perhaps I can test with this model.

xffxff · 2024-11-27T12:16:39Z

Hmmm, this model is a large model might need 80G A100 for running. Not sure if I can get an idle card for testing these days. 😅

@Isotr0py I’m happy to help with adding tests! It would be great if there are some examples I could refer to.

Isotr0py · 2024-11-27T13:23:34Z

@xffxff Thank you very much!

You can just add a test setting for this model in tests/models/decoder_only/vision_language/test_models.py. Then run the added test with a command like:

pytest -s -v tests/models/decoder_only/vision_language/test_models.py -k aria

You can refer to other models's settings there:

vllm/tests/models/decoder_only/vision_language/test_models.py

Lines 119 to 135 in 9e0a147

    
           "qwen2_vl": VLMTestInfo( 
        
               models=["Qwen/Qwen2-VL-2B-Instruct"], 
        
               test_type=( 
        
                   VLMTestType.IMAGE, 
        
                   VLMTestType.MULTI_IMAGE, 
        
                   VLMTestType.VIDEO 
        
               ), 
        
               prompt_formatter=lambda img_prompt: f"<|im_start|>User\n{img_prompt}<|im_end|>\n<|im_start|>assistant\n", # noqa: E501 
        
               img_idx_to_prompt=lambda idx: "<|vision_start|><|image_pad|><|vision_end|>", # noqa: E501 
        
               video_idx_to_prompt=lambda idx: "<|vision_start|><|video_pad|><|vision_end|>", # noqa: E501 
        
               max_model_len=4096, 
        
               max_num_seqs=2, 
        
               auto_cls=AutoModelForVision2Seq, 
        
               vllm_output_post_proc=model_utils.qwen2_vllm_to_hf_output, 
        
               image_size_factors=[(), (0.25,), (0.25, 0.25, 0.25), (0.25, 0.2, 0.15)], 
        
               marks=[pytest.mark.core_model, pytest.mark.cpu_model], 
        
           ),

Signed-off-by: xffxff <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Andrew Feldman <[email protected]>

Signed-off-by: xffxff <[email protected]> Co-authored-by: Isotr0py <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Nov 21, 2024

Isotr0py reviewed Nov 21, 2024

View reviewed changes

vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved

vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved

vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved

Isotr0py reviewed Nov 21, 2024

View reviewed changes

vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved

xffxff force-pushed the support_aria branch 2 times, most recently from 56090fb to a0ebc3a Compare November 22, 2024 04:06

add support for aria model

8d0da9a

Signed-off-by: xffxff <[email protected]>

xffxff force-pushed the support_aria branch from 947b707 to 8d0da9a Compare November 22, 2024 04:12

support prompts with different number of images

4396ecd

Signed-off-by: xffxff <[email protected]>

xffxff requested a review from Isotr0py November 22, 2024 05:58

Isotr0py reviewed Nov 22, 2024

View reviewed changes

vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved

Isotr0py reviewed Nov 22, 2024

View reviewed changes

vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved

xffxff and others added 2 commits November 22, 2024 15:55

Update vllm/model_executor/models/aria.py

3eaaca4

Co-authored-by: Isotr0py <[email protected]>

support rhymes-ai/Aria instead of rhymes-ai/Aria-sequential_mlp

dabb331

Signed-off-by: xffxff <[email protected]>

make format happy

a35ba85

Signed-off-by: xffxff <[email protected]>

xffxff added 4 commits November 22, 2024 09:35

do not apply tp to cross attention module of projector

6413e28

fix results when tp is enabled

d4d62da

make format happy

292688a

Signed-off-by: xffxff <[email protected]>

remove unused code

1bd6d46

Signed-off-by: xffxff <[email protected]>

xffxff mentioned this pull request Nov 22, 2024

Upstream model implementation to vLLM rhymes-ai/Aria#20

Closed

xffxff added 2 commits November 22, 2024 10:55

refactor based on reviewer's feedback

afbabd3

Signed-off-by: xffxff <[email protected]>

add examples

a4ffaba

Signed-off-by: xffxff <[email protected]>

mergify bot added the frontend label Nov 25, 2024

refactor: follow the new interface

9525e79

Signed-off-by: xffxff <[email protected]>

Isotr0py approved these changes Nov 25, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 25, 2024 11:06

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 25, 2024

add aria registry test

8908a66

Signed-off-by: Isotr0py <[email protected]>

Isotr0py requested review from DarkLight1337 and ywang96 as code owners November 25, 2024 15:30

DarkLight1337 reviewed Nov 25, 2024

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

Fix indent

7e8e37f

DarkLight1337 merged commit b1d9205 into vllm-project:main Nov 25, 2024
52 checks passed

youkaichao mentioned this pull request Nov 25, 2024

[bugfix] fix aria model and add torch.compile #10645

Merged

xffxff mentioned this pull request Nov 29, 2024

[Model]: add some tests for aria model #10770

Merged

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Model]: Add support for Aria model (vllm-project#10514)

f8da55f

Signed-off-by: xffxff <[email protected]> Co-authored-by: Isotr0py <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model]: Add support for Aria model #10514

[Model]: Add support for Aria model #10514

xffxff commented Nov 21, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 21, 2024

Isotr0py left a comment

Isotr0py left a comment

ywang96 commented Nov 22, 2024

xffxff commented Nov 22, 2024

xffxff commented Nov 22, 2024

xffxff commented Nov 25, 2024

xffxff commented Nov 25, 2024

Isotr0py left a comment

xffxff commented Nov 25, 2024

DarkLight1337 commented Nov 25, 2024

DarkLight1337 commented Nov 27, 2024

Isotr0py commented Nov 27, 2024 •

edited

Loading

xffxff commented Nov 27, 2024

Isotr0py commented Nov 27, 2024

[Model]: Add support for Aria model #10514

[Model]: Add support for Aria model #10514

Conversation

xffxff commented Nov 21, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 21, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

Isotr0py left a comment

Choose a reason for hiding this comment

ywang96 commented Nov 22, 2024

xffxff commented Nov 22, 2024

xffxff commented Nov 22, 2024

xffxff commented Nov 25, 2024

xffxff commented Nov 25, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

xffxff commented Nov 25, 2024

DarkLight1337 commented Nov 25, 2024

DarkLight1337 commented Nov 27, 2024

Isotr0py commented Nov 27, 2024 • edited Loading

xffxff commented Nov 27, 2024

Isotr0py commented Nov 27, 2024

xffxff commented Nov 21, 2024 •

edited by github-actions bot

Loading

Isotr0py commented Nov 27, 2024 •

edited

Loading