Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Roberta embedding models #9387

Merged
merged 26 commits into from
Nov 14, 2024
Merged

Conversation

maxdebayser
Copy link
Contributor

@maxdebayser maxdebayser commented Oct 15, 2024

This PR adds support for Roberta embedding models. It's mostly the same as the Bert architecture, the only thing that changes is the padding token in the Embedding layer so this PR tries to reuse Bert modeling classes as much as possible. For some of the models we also need head size 32, so this size is added to the kernels here.

cc: @robertgshaw2-neuralmagic , @DarkLight1337

FIX #9847

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@maxdebayser maxdebayser mentioned this pull request Oct 15, 2024
@maxdebayser maxdebayser force-pushed the roberta branch 2 times, most recently from b02f53c to 2138ac3 Compare November 1, 2024 15:27
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
@maxdebayser maxdebayser marked this pull request as ready for review November 7, 2024 18:17
@maxdebayser maxdebayser changed the title Roberta Support Roberta embedding models Nov 7, 2024
@DarkLight1337
Copy link
Member

Model implementation looks good (though a bit hacky). Can you add tests for the model?

@DarkLight1337
Copy link
Member

Also, remember to update the Supported Models page.

@DarkLight1337
Copy link
Member

It would also be nice to consider adding the SequenceClassification variant of the model (to solve #8022)

@rpvelloso
Copy link

great! What's the ETA on this???

@maxdebayser
Copy link
Contributor Author

It would also be nice to consider adding the SequenceClassification variant of the model (to solve #8022)

@DarkLight1337 , this would be great yes. I was thinking that we could use your chat embedding API to format sentence pair separated by a separator token as input to sentence classification models. The only problem would be the token type tensor that also has to be passed as input. But maybe this would be outside of the scope of this issue. Maybe we can add this in another PR just to keep the scope of each PR small.

@maxdebayser
Copy link
Contributor Author

Model implementation looks good (though a bit hacky). Can you add tests for the model?

Sure, I'll add the tests. I don't disagree that this is a bit hacky. Should we make the Bert classes more generic so that we can pass the embedding layer class as a parameter?

@DarkLight1337
Copy link
Member

DarkLight1337 commented Nov 8, 2024

Model implementation looks good (though a bit hacky). Can you add tests for the model?

Sure, I'll add the tests. I don't disagree that this is a bit hacky. Should we make the Bert classes more generic so that we can pass the embedding layer class as a parameter?

That would be great. Another way would be to have an abstract _init_embeddings etc. so subclasses can decide how to initialize each submodule.

Signed-off-by: Max de Bayser <[email protected]>
Copy link

mergify bot commented Nov 12, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxdebayser.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 12, 2024
Signed-off-by: Flavia Beo <[email protected]>
@mergify mergify bot removed the needs-rebase label Nov 12, 2024
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding this!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 13, 2024 15:00
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 13, 2024
flaviabeo and others added 2 commits November 13, 2024 13:19
Signed-off-by: Flavia Beo <[email protected]>
auto-merge was automatically disabled November 13, 2024 17:05

Head branch was pushed to by a user without write access

flaviabeo and others added 2 commits November 13, 2024 14:49
@maxdebayser
Copy link
Contributor Author

@DarkLight1337, I realized that with Roberta models the position_ids start at padding_idx + 1 (see here and here )

I've added a line of code to increment all position_ids by that amount. Without this, the results I get in the STS12 task from the MTEB benchmark for intfloat/multilingual-e5-large is this, 0.53, which is way off. With the change I get 0.80, which is correct.

In my tests, all the position_ids in vllm for the embedding use case start with 0 and end with len()-1 and there are no padding tokens because the input tensors are 1-dimensional without padding. For example:

input_ids=tensor([     0,   1284,     70,   1821,   6275,    111,  21455,      6,      4,
            70,  21640,  31486,    111,     70,  15437,    509,     10,  63819,
             6,      5,      2,      0,   2367,     83,   1286,      6,      4,
            70,  16648,    111,  29700,    621,    959, 133888,     47, 137447,
          2363, 102880,    111,  77021,     47,  16839,  27289,      6,      5,
             2], device='cuda:0')
position_ids=tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
        15, 16, 17, 18, 19, 20, 21, 22, 23, 24], device='cuda:0')

Is there a scenario in which there could be a presence of padding tokens? (Except for the case in which the user inserts in the input text).

@DarkLight1337
Copy link
Member

I've added a line of code to increment all position_ids by that amount. Without this, the results I get in the STS12 task from the MTEB benchmark for intfloat/multilingual-e5-large is this, 0.53, which is way off. With the change I get 0.80, which is correct.

Hmm, we may need to add a correctness test that compares against HF then.

Is there a scenario in which there could be a presence of padding tokens? (Except for the case in which the user inserts in the input text).

Don't think so, since vLLM encodes each prompt separately. Just to be sure, you can add an assertion statement so we know if our assumption is false.

@sorenmc
Copy link

sorenmc commented Nov 14, 2024

It would also be nice to consider adding the SequenceClassification variant of the model (to solve #8022)

Are you still considering adding this to the pr? If not i could try to make an attempt.

Signed-off-by: Max de Bayser <[email protected]>
@maxdebayser
Copy link
Contributor Author

@DarkLight1337 , I've added an assert on the position_ids.

Hmm, we may need to add a correctness test that compares against HF then.

Yes, but it would have to be sentence-transformers. In the transformers library the pooled output is obtained by running the last hidden states through the pooler layer. But in sentence-transformers, this output is discarded and the hidden states are pooled using MEAN, CLS ... like we do and then normalized. Would it be OK to add this dependency? If yes, can we do this in another PR?

@DarkLight1337
Copy link
Member

DarkLight1337 commented Nov 14, 2024

The existing tests for text-only embedding models already use sentence-transformers, so it should be pretty straightforward to add this model to the list.

Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
The test is failing with Unsupported('dynamic shape operator: aten.nonzero.default; to enable, set torch._dynamo.config.capture_dynamic_output_shape_ops = True\n\nfrom user code:\n   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/roberta.py", line 107, in forward\n    assert len(torch.nonzero(positions[start_pos])) == 0\n\nSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information\n\n\nYou can suppress this exception and fall back to eager by setting:\n    import torch._dynamo\n    torch._dynamo.config.suppress_errors = True\n')

Signed-off-by: Max de Bayser <[email protected]>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 14, 2024 17:22
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
@DarkLight1337 DarkLight1337 merged commit 4a18fd1 into vllm-project:main Nov 14, 2024
72 checks passed
@maxdebayser
Copy link
Contributor Author

The discussion on SequenceClassification models continues here in this other PR: #10400

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Co-authored-by: Flavia Beo <[email protected]>
mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Co-authored-by: Flavia Beo <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
rickyyx pushed a commit to rickyyx/vllm that referenced this pull request Nov 20, 2024
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Co-authored-by: Flavia Beo <[email protected]>
Signed-off-by: rickyx <[email protected]>
tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Co-authored-by: Flavia Beo <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Co-authored-by: Flavia Beo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: BAAI/bge-m3
5 participants