[Frontend] Online Pooling API #11457

DarkLight1337 · 2024-12-24T07:12:44Z

Previously, #11129 made Embeddings API unusable for reward models. This PR adds a new Pooling API that satisfies this need. It also adds backward compatibility by making Embedding API fall back to Pooling API if the model doesn't support embedding outputs (which is the case for reward models).

FIX #11446

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2024-12-24T07:12:55Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

Isotr0py

LGTM!

DarkLight1337 added 3 commits December 24, 2024 07:10

Add online pooling API with fallback from embeddings API

6809410

Signed-off-by: DarkLight1337 <[email protected]>

Update docs

c0b35dc

Signed-off-by: DarkLight1337 <[email protected]>

Clean up

6ff8b70

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 24, 2024

DarkLight1337 requested a review from Isotr0py December 24, 2024 07:12

mergify bot added documentation Improvements or additions to documentation frontend labels Dec 24, 2024

DarkLight1337 added 4 commits December 24, 2024 07:20

Fix fallback

80fde42

Signed-off-by: DarkLight1337 <[email protected]>

Format

e5985e5

Signed-off-by: DarkLight1337 <[email protected]>

Update docstring

2575506

Signed-off-by: DarkLight1337 <[email protected]>

Add tests

44d9111

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from robertgshaw2-neuralmagic and simon-mo as code owners December 24, 2024 07:41

DarkLight1337 added 3 commits December 24, 2024 07:53

Fix tests

eb3c2f2

Signed-off-by: DarkLight1337 <[email protected]>

Fix tensor device

e0ebd12

Signed-off-by: DarkLight1337 <[email protected]>

Fix dtype as well

d7932a2

Signed-off-by: DarkLight1337 <[email protected]>

Isotr0py approved these changes Dec 24, 2024

View reviewed changes

DarkLight1337 merged commit 9edca6b into vllm-project:main Dec 24, 2024
52 checks passed

DarkLight1337 deleted the online-pooling-api branch December 24, 2024 09:54

This was referenced Dec 24, 2024

[Bug]: Qwen2.5-Math-RM-72B Online Inference Fails #11446

Closed

[RFC]: Make any vLLM model a pooling model #10674

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Online Pooling API #11457

[Frontend] Online Pooling API #11457

DarkLight1337 commented Dec 24, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 24, 2024

Isotr0py left a comment

[Frontend] Online Pooling API #11457

[Frontend] Online Pooling API #11457

Conversation

DarkLight1337 commented Dec 24, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 24, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Dec 24, 2024 •

edited by github-actions bot

Loading