Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update outlines support to v0.1.4 #10490

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Treparme
Copy link

@Treparme Treparme commented Nov 20, 2024

Upgrades (or loosens) the outlines dependency.
It out of the box supports a higher version which improves speed.

FIX #10489

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Nov 20, 2024
@mgoin
Copy link
Member

mgoin commented Nov 20, 2024

Hi @Treparme could you please share how you tested? I was also working on this upgrade yesterday and ran into a breaking change due to the introduction of outlines-core.

Here is how I set up and then my error.
Server:

vllm serve meta-llama/Llama-3.2-1B-Instruct

Client:

from pydantic import BaseModel
from openai import OpenAI


class Info(BaseModel):
    name: str
    age: int


client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
model_id = client.models.list().data[0].id
print("Model ID:", model_id)
completion = client.beta.chat.completions.parse(
    model=model_id,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
    ],
    response_format=Info,
    extra_body=dict(guided_decoding_backend="outlines"),
)

message = completion.choices[0].message
print(message)
assert message.parsed
print("Name:", message.parsed.name)
print("Age:", message.parsed.age)

Error (truncated):

  File "/home/mgoin/code/vllm/vllm/engine/multiprocessing/client.py", line 582, in _process_request
    params = await \
             ^^^^^^^
  File "/home/mgoin/code/vllm/vllm/engine/async_llm_engine.py", line 539, in build_guided_decoding_logits_processor_async
    processor = await get_guided_decoding_logits_processor(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/__init__.py", line 16, in get_guided_decoding_logits_processor
    return await get_outlines_guided_decoding_logits_processor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_decoding.py", line 72, in get_outlines_guided_decoding_logits_processor
    return await loop.run_in_executor(global_thread_pool,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/.local/share/uv/python/cpython-3.12.4-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_decoding.py", line 127, in _get_logits_processor
    return JSONLogitsProcessor(guide, tokenizer, whitespace_pattern)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 151, in __init__
    super().__init__(regex_string, tokenizer)
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 115, in __init__
    guide = RegexLogitsProcessor._get_guide(regex_string, tokenizer)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm/lib/python3.12/site-packages/outlines/caching.py", line 122, in wrapper
    result = cached_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 102, in _get_guide
    return RegexGuide(regex_string, tokenizer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: RegexGuide.__init__() missing 2 required positional arguments: 'eos_tensor' and 'initial_state'

@Treparme
Copy link
Author

Hi @mgoin

We run it in a different way, something similar as below works

self.outlines_tokenizer = TransformerTokenizer(
    AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
    )
    logits_processor = JSONLogitsProcessor(schema=json_schema, tokenizer=self.outlines_tokenizer)
    logits_processor = build_vllm_logits_processor(self.tokenizer_data, parser)

sampling_params = SamplingParams(
  temperature=1e-6,
  max_tokens=2000,
  logits_processors=[logits_processor],
  logprobs=5,
)

results_generator = self.engine.generate(final_prompt, sampling_params, request_id, lora_request=lora)

This works
I want to add that the < 0.1.4 is intentional, and 0.1.3 is the highest one working.
0.1.4 has a caching bug :)
dottxt-ai/outlines#1274

@mgoin
Copy link
Member

mgoin commented Nov 21, 2024

I believe 0.1.0-0.1.3 will have a larger issue with vLLM because the version of outlines-core that requires doesn't have a wheel for python 3.12, which we use in our testing image

@Treparme
Copy link
Author

Hmmm annoying
At the moment I install vllm this way

pip3 install --user outlines==0.1.3 
pip3 install --user vllm==0.6.4.post1 --no-deps
pip3 install --user $(pip show vllm | grep "Requires:" | sed 's/Requires: //g' | tr ',' '\n' | grep -v "outlines" | tr '\n' ' ')
pip3 install --user ray[serve] boto3 distance

This works and skips the dependency issue (ignores it)
Outlines does work this version (if you set it up in a certain way, see above comment) and runs nicely.
I think maybe waiting for the caching bug to be solved (in 0.1.4) and having wheels for python 3.12 would make this mergeable?

@russellb
Copy link
Collaborator

Hi @Treparme could you please share how you tested? I was also working on this upgrade yesterday and ran into a breaking change due to the introduction of outlines-core.

Here's the PR that changed the interface: dottxt-ai/outlines-core#40

I'll sort out what change we need on the vllm side.

@russellb
Copy link
Collaborator

Hi @Treparme could you please share how you tested? I was also working on this upgrade yesterday and ran into a breaking change due to the introduction of outlines-core.

Here's the PR that changed the interface: dottxt-ai/outlines-core#40

I'll sort out what change we need on the vllm side.

The change is trivial.

https://github.com/vllm-project/vllm/compare/main...russellb:vllm:outlines-0.1.4?expand=1

but with this change in place, I hit dottxt-ai/outlines#1274

It sounds like we just need to wait for another release with a fix, and then we can move forward.

@russellb
Copy link
Collaborator

https://github.com/vllm-project/vllm/compare/main...russellb:vllm:outlines-0.1.4?expand=1

but with this change in place, I hit dottxt-ai/outlines#1274

It sounds like we just need to wait for another release with a fix, and then we can move forward.

I tested outlines with their fix (which was to just remove the cache usage). It worked after I removed vllm's usage of the same API. I updated my branch with that change.

@Treparme
Copy link
Author

https://github.com/vllm-project/vllm/compare/main...russellb:vllm:outlines-0.1.4?expand=1
but with this change in place, I hit dottxt-ai/outlines#1274
It sounds like we just need to wait for another release with a fix, and then we can move forward.

I tested outlines with their fix (which was to just remove the cache usage). It worked after I removed vllm's usage of the same API. I updated my branch with that change.

Hi @russellb

Thanks for this, higher version of outlines possible and working in vllm: cool

How would you like to proceed, create a new pr or?

@russellb
Copy link
Collaborator

https://github.com/vllm-project/vllm/compare/main...russellb:vllm:outlines-0.1.4?expand=1
but with this change in place, I hit dottxt-ai/outlines#1274
It sounds like we just need to wait for another release with a fix, and then we can move forward.

I tested outlines with their fix (which was to just remove the cache usage). It worked after I removed vllm's usage of the same API. I updated my branch with that change.

Hi @russellb

Thanks for this, higher version of outlines possible and working in vllm: cool

How would you like to proceed, create a new pr or?

Yes, I guess a new PR would be easiest. I opened #10576.

Thank you for opening this PR to help push this along!

@Treparme
Copy link
Author

Thanks @russellb
Will close this once #10576 gets closed, looking forward

@russellb
Copy link
Collaborator

Thanks @russellb Will close this once #10576 gets closed, looking forward

sounds good - i'm still working through some variations on the same root problem you observed with 0.1.4 (serialization failing in different spots)

torymur pushed a commit to dottxt-ai/outlines-core that referenced this pull request Dec 2, 2024
I understand that `pickleable` is not your priority right now. But the
`RegexGuide` needs to be pickled for `vllm` production use, which is
multiprocessing-based.

This PR reintroduces this pickling capability + some tests.

I understand that this introduces more effort on your side.

References:
dottxt-ai/outlines#1274
vllm-project/vllm#10490
vllm-project/vllm#10576
vllm-project/vllm#10489

It would also tackle the current caching issues: 
huggingface/text-generation-inference#2766
dottxt-ai/outlines#1283

Closes:
#95
Copy link

mergify bot commented Dec 17, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Treparme.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Support outlines versions > v0.1.0
3 participants