Support Roberta embedding models #9387

maxdebayser · 2024-10-15T18:58:59Z

This PR adds support for Roberta embedding models. It's mostly the same as the Bert architecture, the only thing that changes is the padding token in the Embedding layer so this PR tries to reuse Bert modeling classes as much as possible. For some of the models we also need head size 32, so this size is added to the kernels here.

cc: @robertgshaw2-neuralmagic , @DarkLight1337

FIX #9847

github-actions · 2024-10-15T18:59:11Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Max de Bayser <[email protected]>

DarkLight1337 · 2024-11-08T05:00:03Z

Model implementation looks good (though a bit hacky). Can you add tests for the model?

DarkLight1337 · 2024-11-08T05:00:28Z

Also, remember to update the Supported Models page.

DarkLight1337 · 2024-11-08T05:04:02Z

It would also be nice to consider adding the SequenceClassification variant of the model (to solve #8022)

rpvelloso · 2024-11-08T12:01:38Z

great! What's the ETA on this???

maxdebayser · 2024-11-08T13:51:56Z

It would also be nice to consider adding the SequenceClassification variant of the model (to solve #8022)

@DarkLight1337 , this would be great yes. I was thinking that we could use your chat embedding API to format sentence pair separated by a separator token as input to sentence classification models. The only problem would be the token type tensor that also has to be passed as input. But maybe this would be outside of the scope of this issue. Maybe we can add this in another PR just to keep the scope of each PR small.

maxdebayser · 2024-11-08T13:53:33Z

Model implementation looks good (though a bit hacky). Can you add tests for the model?

Sure, I'll add the tests. I don't disagree that this is a bit hacky. Should we make the Bert classes more generic so that we can pass the embedding layer class as a parameter?

DarkLight1337 · 2024-11-08T13:54:13Z

Model implementation looks good (though a bit hacky). Can you add tests for the model?

Sure, I'll add the tests. I don't disagree that this is a bit hacky. Should we make the Bert classes more generic so that we can pass the embedding layer class as a parameter?

That would be great. Another way would be to have an abstract _init_embeddings etc. so subclasses can decide how to initialize each submodule.

Signed-off-by: Max de Bayser <[email protected]>

mergify · 2024-11-12T06:56:44Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxdebayser.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Flavia Beo <[email protected]>

vllm/model_executor/models/roberta.py

Signed-off-by: Flavia Beo <[email protected]>

DarkLight1337

LGTM, thanks for adding this!

Signed-off-by: Flavia Beo <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Flavia Beo <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser · 2024-11-14T02:12:44Z

@DarkLight1337, I realized that with Roberta models the position_ids start at padding_idx + 1 (see here and here )

I've added a line of code to increment all position_ids by that amount. Without this, the results I get in the STS12 task from the MTEB benchmark for intfloat/multilingual-e5-large is this, 0.53, which is way off. With the change I get 0.80, which is correct.

In my tests, all the position_ids in vllm for the embedding use case start with 0 and end with len()-1 and there are no padding tokens because the input tensors are 1-dimensional without padding. For example:

input_ids=tensor([     0,   1284,     70,   1821,   6275,    111,  21455,      6,      4,
            70,  21640,  31486,    111,     70,  15437,    509,     10,  63819,
             6,      5,      2,      0,   2367,     83,   1286,      6,      4,
            70,  16648,    111,  29700,    621,    959, 133888,     47, 137447,
          2363, 102880,    111,  77021,     47,  16839,  27289,      6,      5,
             2], device='cuda:0')
position_ids=tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
        15, 16, 17, 18, 19, 20, 21, 22, 23, 24], device='cuda:0')

Is there a scenario in which there could be a presence of padding tokens? (Except for the case in which the user inserts in the input text).

DarkLight1337 · 2024-11-14T03:15:48Z

I've added a line of code to increment all position_ids by that amount. Without this, the results I get in the STS12 task from the MTEB benchmark for intfloat/multilingual-e5-large is this, 0.53, which is way off. With the change I get 0.80, which is correct.

Hmm, we may need to add a correctness test that compares against HF then.

Is there a scenario in which there could be a presence of padding tokens? (Except for the case in which the user inserts in the input text).

Don't think so, since vLLM encodes each prompt separately. Just to be sure, you can add an assertion statement so we know if our assumption is false.

sorenmc · 2024-11-14T09:48:12Z

It would also be nice to consider adding the SequenceClassification variant of the model (to solve #8022)

Are you still considering adding this to the pr? If not i could try to make an attempt.

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser · 2024-11-14T14:23:41Z

@DarkLight1337 , I've added an assert on the position_ids.

Hmm, we may need to add a correctness test that compares against HF then.

Yes, but it would have to be sentence-transformers. In the transformers library the pooled output is obtained by running the last hidden states through the pooler layer. But in sentence-transformers, this output is discarded and the hidden states are pooled using MEAN, CLS ... like we do and then normalized. Would it be OK to add this dependency? If yes, can we do this in another PR?

DarkLight1337 · 2024-11-14T14:25:05Z

The existing tests for text-only embedding models already use sentence-transformers, so it should be pretty straightforward to add this model to the list.

Signed-off-by: Max de Bayser <[email protected]>

The test is failing with Unsupported('dynamic shape operator: aten.nonzero.default; to enable, set torch._dynamo.config.capture_dynamic_output_shape_ops = True\n\nfrom user code:\n File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/roberta.py", line 107, in forward\n assert len(torch.nonzero(positions[start_pos])) == 0\n\nSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information\n\n\nYou can suppress this exception and fall back to eager by setting:\n import torch._dynamo\n torch._dynamo.config.suppress_errors = True\n') Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser · 2024-11-18T00:04:04Z

The discussion on SequenceClassification models continues here in this other PR: #10400

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]>

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]> Signed-off-by: rickyx <[email protected]>

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]>

maxdebayser mentioned this pull request Oct 15, 2024

Roberta embedding #7969

Closed

maxdebayser force-pushed the roberta branch 2 times, most recently from b02f53c to 2138ac3 Compare November 1, 2024 15:27

maxdebayser added 2 commits November 7, 2024 15:08

support head size 32

f7e23fb

Signed-off-by: Max de Bayser <[email protected]>

add support for Roberta models

10ebc9e

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser force-pushed the roberta branch from 2138ac3 to 10ebc9e Compare November 7, 2024 18:11

maxdebayser marked this pull request as ready for review November 7, 2024 18:17

maxdebayser changed the title ~~Roberta~~ Support Roberta embedding models Nov 7, 2024

DarkLight1337 mentioned this pull request Nov 8, 2024

Supporting embedding models #3187

Closed

fix after refactoring

b457cc5

Signed-off-by: Max de Bayser <[email protected]>

mergify bot added the needs-rebase label Nov 12, 2024

Review suggestions

3fe28f6

Signed-off-by: Flavia Beo <[email protected]>

flaviabeo requested review from DarkLight1337 and ywang96 as code owners November 12, 2024 14:28

DarkLight1337 reviewed Nov 12, 2024

View reviewed changes

vllm/model_executor/models/roberta.py Outdated Show resolved Hide resolved

flaviabeo added 2 commits November 12, 2024 13:26

Merge branch 'upstream_main' into roberta

5b75f4a

Fixes conflicts with new upstream changes

971acea

Signed-off-by: Flavia Beo <[email protected]>

mergify bot removed the needs-rebase label Nov 12, 2024

flaviabeo added 3 commits November 12, 2024 14:34

Merge changes fixes

18a2d58

Signed-off-by: Flavia Beo <[email protected]>

More fixed related to the upstream merge

40ac579

Signed-off-by: Flavia Beo <[email protected]>

Adds test for roberta model executor

e171896

Signed-off-by: Flavia Beo <[email protected]>

flaviabeo added 2 commits November 13, 2024 11:11

Fix linting in test

366a992

Signed-off-by: Flavia Beo <[email protected]>

Merge branch 'upstream_main' into roberta

aed1216

DarkLight1337 approved these changes Nov 13, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 13, 2024 15:00

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 13, 2024

flaviabeo and others added 2 commits November 13, 2024 13:19

trigger ci

aae474e

Signed-off-by: Flavia Beo <[email protected]>

finish generalizing the Bert classes

07c931c

Signed-off-by: Max de Bayser <[email protected]>

auto-merge was automatically disabled November 13, 2024 17:05
Head branch was pushed to by a user without write access

flaviabeo and others added 2 commits November 13, 2024 14:49

Skips test for ROCm unsupported platform

4495a50

Signed-off-by: Flavia Beo <[email protected]>

fix roberta position_ids

49e8381

Signed-off-by: Max de Bayser <[email protected]>

add assert to verify assumption

1267bba

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser added 3 commits November 14, 2024 12:45

improve assert

49cc57b

Signed-off-by: Max de Bayser <[email protected]>

add model to embedding test

0f334ae

Signed-off-by: Max de Bayser <[email protected]>

DarkLight1337 enabled auto-merge (squash) November 14, 2024 17:22

maxdebayser added 3 commits November 14, 2024 15:14

trigger ci

44a9d22

Signed-off-by: Max de Bayser <[email protected]>

trigger ci

9f31bd5

Signed-off-by: Max de Bayser <[email protected]>

trigger ci

80ead23

Signed-off-by: Max de Bayser <[email protected]>

DarkLight1337 merged commit 4a18fd1 into vllm-project:main Nov 14, 2024
72 checks passed

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

Support Roberta embedding models (vllm-project#9387)

544ae42

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Roberta embedding models #9387

Support Roberta embedding models #9387

maxdebayser commented Oct 15, 2024 •

edited by DarkLight1337

Loading

github-actions bot commented Oct 15, 2024

DarkLight1337 commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024

rpvelloso commented Nov 8, 2024

maxdebayser commented Nov 8, 2024

maxdebayser commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024 •

edited

Loading

mergify bot commented Nov 12, 2024

DarkLight1337 left a comment

maxdebayser commented Nov 14, 2024

DarkLight1337 commented Nov 14, 2024

sorenmc commented Nov 14, 2024

maxdebayser commented Nov 14, 2024

DarkLight1337 commented Nov 14, 2024 •

edited

Loading

maxdebayser commented Nov 18, 2024

Support Roberta embedding models #9387

Support Roberta embedding models #9387

Conversation

maxdebayser commented Oct 15, 2024 • edited by DarkLight1337 Loading

github-actions bot commented Oct 15, 2024

DarkLight1337 commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024

rpvelloso commented Nov 8, 2024

maxdebayser commented Nov 8, 2024

maxdebayser commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024 • edited Loading

mergify bot commented Nov 12, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

maxdebayser commented Nov 14, 2024

DarkLight1337 commented Nov 14, 2024

sorenmc commented Nov 14, 2024

maxdebayser commented Nov 14, 2024

DarkLight1337 commented Nov 14, 2024 • edited Loading

maxdebayser commented Nov 18, 2024

maxdebayser commented Oct 15, 2024 •

edited by DarkLight1337

Loading

DarkLight1337 commented Nov 8, 2024 •

edited

Loading

DarkLight1337 commented Nov 14, 2024 •

edited

Loading