-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update outlines support to v0.1.4 #10490
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Hi @Treparme could you please share how you tested? I was also working on this upgrade yesterday and ran into a breaking change due to the introduction of outlines-core. Here is how I set up and then my error.
Client: from pydantic import BaseModel
from openai import OpenAI
class Info(BaseModel):
name: str
age: int
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
model_id = client.models.list().data[0].id
print("Model ID:", model_id)
completion = client.beta.chat.completions.parse(
model=model_id,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
],
response_format=Info,
extra_body=dict(guided_decoding_backend="outlines"),
)
message = completion.choices[0].message
print(message)
assert message.parsed
print("Name:", message.parsed.name)
print("Age:", message.parsed.age) Error (truncated):
|
Hi @mgoin We run it in a different way, something similar as below works self.outlines_tokenizer = TransformerTokenizer(
AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
)
logits_processor = JSONLogitsProcessor(schema=json_schema, tokenizer=self.outlines_tokenizer)
logits_processor = build_vllm_logits_processor(self.tokenizer_data, parser)
sampling_params = SamplingParams(
temperature=1e-6,
max_tokens=2000,
logits_processors=[logits_processor],
logprobs=5,
)
results_generator = self.engine.generate(final_prompt, sampling_params, request_id, lora_request=lora) This works |
I believe |
Hmmm annoying
This works and skips the dependency issue (ignores it) |
Here's the PR that changed the interface: dottxt-ai/outlines-core#40 I'll sort out what change we need on the vllm side. |
The change is trivial. https://github.com/vllm-project/vllm/compare/main...russellb:vllm:outlines-0.1.4?expand=1 but with this change in place, I hit dottxt-ai/outlines#1274 It sounds like we just need to wait for another release with a fix, and then we can move forward. |
I tested outlines with their fix (which was to just remove the cache usage). It worked after I removed vllm's usage of the same API. I updated my branch with that change. |
Hi @russellb Thanks for this, higher version of outlines possible and working in vllm: cool How would you like to proceed, create a new pr or? |
Yes, I guess a new PR would be easiest. I opened #10576. Thank you for opening this PR to help push this along! |
I understand that `pickleable` is not your priority right now. But the `RegexGuide` needs to be pickled for `vllm` production use, which is multiprocessing-based. This PR reintroduces this pickling capability + some tests. I understand that this introduces more effort on your side. References: dottxt-ai/outlines#1274 vllm-project/vllm#10490 vllm-project/vllm#10576 vllm-project/vllm#10489 It would also tackle the current caching issues: huggingface/text-generation-inference#2766 dottxt-ai/outlines#1283 Closes: #95
This pull request has merge conflicts that must be resolved before it can be |
Upgrades (or loosens) the outlines dependency.
It out of the box supports a higher version which improves speed.
FIX #10489