[Core] Update to outlines >= 0.1.8 #10576

russellb · 2024-11-22T15:36:18Z

This PR updates to the latest release of outlines that works with vllm.

It is a draft while we wait for 0.1.8 to be on pypi.

279ccc9 [Core] Update to outlines >= 0.1.8

commit 279ccc9
Author: Russell Bryant [email protected]
Date: Thu Nov 21 21:25:22 2024 +0000

[Core] Update to outlines >= 0.1.8

0.1.x prior to 0.1.8 + outlines-core 0.1.18 had issues with
serialization that broke vllm integration.

Also change our code slightly to account for an API change in
outlines.

Signed-off-by: Russell Bryant <[email protected]>

github-actions · 2024-11-22T15:36:32Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

russellb · 2024-11-22T16:27:53Z

0.1.5 has been released, so this is ready to test

russellb · 2024-11-22T16:59:25Z

converted back to a draft because I don't have 0.1.5 working locally, yet ...

joennlae · 2024-11-30T21:04:10Z

I introduce this PR to outlines:

dottxt-ai/outlines-core#99

The caching issue (or not being able to pickle) also affects the multiprocessing-based engine in vllm as we need to be able to pickle the RegexGuide. I introduce the pickling capability with the above PR.

References:

vllm/vllm/engine/multiprocessing/client.py

Line 604 in 7e4bbda

lp_bytes = cloudpickle.dumps(logits_processors)

markmc · 2024-12-02T15:32:03Z

I introduce this PR to outlines:

dottxt-ai/outlines-core#99

The caching issue (or not being able to pickle) also affects the multiprocessing-based engine in vllm as we need to be able to pickle the RegexGuide. I introduce the pickling capability with the above PR.

Thank you @joennlae - this works for me

Here are some performance measurements with (a) older outlines, (b) newer outlines with your patch but no caching, (c) newer outlines with your patch and caching reinstated:

(a) commit f9310cbd0c1109c4f22cf9f1dc615b2d08f06408 from main
   with	pull/10557/head:xuechendi-benchmark-structured-output commit 41966603
   outlines==0.0.46

   $ python benchmarks/benchmark_guided.py --async-engine --model meta-llama/Llama-3.2-1B-Instruct --output-len 2048 --num-prompts 10 --save-results
   Throughput: 0.14 requests/s, 361.06 total tokens/s, 281.59 output tokens/s Correct rate is 100.0 %
   Throughput: 0.16 requests/s, 414.12 total tokens/s, 322.97 output tokens/s Correct rate is 100.0 %
   Throughput: 0.16 requests/s, 414.91 total tokens/s, 323.59 output tokens/s Correct rate is 100.0 %

(b) commit b8c2895a2cbe249e86713615c9ed3ab132812b08 from russellb/outlines-0.1.4
   with pull/10557/head:xuechendi-benchmark-structured-output commit 41966603
   outlines==0.1.7 (from PyPi)
   outlines_core 412ef296392a0814a5490ccc15080e79f98cd411 from 44ai-labs/serialization "release build" (pip install .)

   $ python benchmarks/benchmark_guided.py --async-engine --model meta-llama/Llama-3.2-1B-Instruct --output-len 2048 --num-prompts 10 --save-results
   Throughput: 0.06 requests/s, 167.18 total tokens/s, 130.38 output tokens/s Correct rate is 100.0 %
   Throughput: 0.06 requests/s, 167.48 total tokens/s, 130.61 output tokens/s Correct rate is 100.0 %
   Throughput: 0.07 requests/s, 171.11 total tokens/s, 133.45 output tokens/s Correct rate is 100.0 %

(c)   same	as above, with "Stop using outlines.caching.cache" reverted
   Throughput: 0.16 requests/s, 410.16 total tokens/s, 319.88 output tokens/s Correct rate is 100.0 %
   Throughput: 0.15 requests/s, 406.16 total tokens/s, 316.76 output tokens/s Correct rate is 100.0 %
   Throughput: 0.16 requests/s, 408.68 total tokens/s, 318.73 output tokens/s Correct rate is 100.0 %

I understand that `pickleable` is not your priority right now. But the `RegexGuide` needs to be pickled for `vllm` production use, which is multiprocessing-based. This PR reintroduces this pickling capability + some tests. I understand that this introduces more effort on your side. References: dottxt-ai/outlines#1274 vllm-project/vllm#10490 vllm-project/vllm#10576 vllm-project/vllm#10489 It would also tackle the current caching issues: huggingface/text-generation-inference#2766 dottxt-ai/outlines#1283 Closes: #95

mergify · 2024-12-03T07:18:23Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

russellb · 2024-12-03T19:27:31Z

I think this will be ready once outlines 0.1.8 is available on pypi.

rlouf · 2024-12-06T17:12:01Z

It is !

mgoin

LGTM pending green CI!

0.1.x prior to 0.1.8 + outlines-core 0.1.18 had issues with serialization that broke vllm integration. Also change our code slightly to account for an API change in outlines. Signed-off-by: Russell Bryant <[email protected]>

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Akshat Tripathi <[email protected]>

Signed-off-by: Russell Bryant <[email protected]>

mergify bot added the ci/build label Nov 22, 2024

russellb mentioned this pull request Nov 22, 2024

Update outlines support to v0.1.4 #10490

Open

russellb marked this pull request as ready for review November 22, 2024 16:27

russellb marked this pull request as draft November 22, 2024 16:59

joennlae mentioned this pull request Nov 30, 2024

Make RegexGuide pickleable again for vllm and tgi dottxt-ai/outlines-core#99

Merged

russellb force-pushed the outlines-0.1.4 branch 2 times, most recently from 1b6e5f6 to 22ea8e8 Compare December 2, 2024 15:15

mergify bot added the needs-rebase label Dec 3, 2024

russellb mentioned this pull request Dec 3, 2024

[Core] Do async init of xgrammar in the engine #10871

Closed

russellb force-pushed the outlines-0.1.4 branch from 22ea8e8 to 4147eab Compare December 3, 2024 19:26

russellb changed the title ~~[Core] Update to outlines > 0.1.4~~ [Core] Update to outlines >= 0.1.8 Dec 3, 2024

russellb force-pushed the outlines-0.1.4 branch from 4147eab to 279ccc9 Compare December 3, 2024 19:31

mergify bot removed the needs-rebase label Dec 3, 2024

russellb force-pushed the outlines-0.1.4 branch from 279ccc9 to 8bcfde4 Compare December 6, 2024 16:12

russellb marked this pull request as ready for review December 6, 2024 16:13

mgoin mentioned this pull request Dec 6, 2024

[Feature]: Make outlines dependency optional #3794

Closed

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2024

mgoin approved these changes Dec 6, 2024

View reviewed changes

aarnphm approved these changes Dec 6, 2024

View reviewed changes

saattrupdan mentioned this pull request Dec 10, 2024

[BUG] Cannot convert token .� (172761) to bytes: .� ScandEval/ScandEval#539

Open

[Core] Update to outlines >= 0.1.8

2143457

0.1.x prior to 0.1.8 + outlines-core 0.1.18 had issues with serialization that broke vllm integration. Also change our code slightly to account for an API change in outlines. Signed-off-by: Russell Bryant <[email protected]>

russellb force-pushed the outlines-0.1.4 branch from 8bcfde4 to 2143457 Compare December 10, 2024 15:01

aarnphm approved these changes Dec 10, 2024

View reviewed changes

youkaichao merged commit e739194 into vllm-project:main Dec 10, 2024
72 of 74 checks passed

khluu mentioned this pull request Dec 11, 2024

[ci/build] Fix entrypoints test and pin outlines version #11088

Merged

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Dec 12, 2024

[Core] Update to outlines >= 0.1.8 (vllm-project#10576)

8fbffb2

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Akshat Tripathi <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Core] Update to outlines >= 0.1.8 (vllm-project#10576)

6fd764e

Signed-off-by: Russell Bryant <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Update to outlines >= 0.1.8 #10576

[Core] Update to outlines >= 0.1.8 #10576

russellb commented Nov 22, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 22, 2024

russellb commented Nov 22, 2024

russellb commented Nov 22, 2024

joennlae commented Nov 30, 2024 •

edited

Loading

markmc commented Dec 2, 2024 •

edited

Loading

mergify bot commented Dec 3, 2024

russellb commented Dec 3, 2024

rlouf commented Dec 6, 2024

mgoin left a comment

[Core] Update to outlines >= 0.1.8 #10576

[Core] Update to outlines >= 0.1.8 #10576

Conversation

russellb commented Nov 22, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 22, 2024

russellb commented Nov 22, 2024

russellb commented Nov 22, 2024

joennlae commented Nov 30, 2024 • edited Loading

markmc commented Dec 2, 2024 • edited Loading

mergify bot commented Dec 3, 2024

russellb commented Dec 3, 2024

rlouf commented Dec 6, 2024

mgoin left a comment

Choose a reason for hiding this comment

russellb commented Nov 22, 2024 •

edited by github-actions bot

Loading

joennlae commented Nov 30, 2024 •

edited

Loading

markmc commented Dec 2, 2024 •

edited

Loading