-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAIGenerator uses chat_completions endpoint. Error with model that has no chat_template in config #8275
Comments
Can you please provide some sample code to try and reproduce the error? I understand why it is happening (completions API is legacy and might stop being supposed by OpenAI). There are ways to get the chat completions endpoint to mimic the completions one and that is what Haystack tries to do, but I'd need an example to see if the issue is with vLLM or Haystack. |
The header error is definitely on vLLM, or at least the fork the runpod folks are using. But I don't think it's right to have the text completion class use the chat completion endpoint. If the completions endpoint gets removed then it's best to let the calls fail and inform the user rather than use a different endpoint in my opinion. I was getting weird responses and I wouldn't have ever known why it f I didn't try a model that didn't have a chat template. Like if you prompt "it was a normal summer day until something unexpected happened" the chat endpoint will respond "what happened?" rather than continue the story. If you want to keep things as is to not break existing users code you could add a boolean kwarg |
I definitely see benefits and downsides to using the basic completions vs the chat completions for the regular generator. Using OpenAIGenerator and a "prompt" while then converting it to ChatMessage in the backend allows for users to quickly try the generators without having to worry about roles. This also allows users to be able to use the most recent models available from OpenAI (4o and mini are not available in the regular completions api). In regard to completions... some models are definitely smart enough to finish typing what you write, and you can reinforce it by setting a system prompt that tells it how exactly to complete it. And then when it comes to setting the api_base_url and templates, since the chat completions endpoint is being used, some implementations of an open ai api compatible server may handle it differently. For example, this is how Ollama handles it:
This means that Ollama can effectively mimic a completions call to the chat completions api despite a model not having a template (like base models do). vLLM does not seem to take this approach and unless they want to provide a default "fallback" template, you will probably need to provide your own that does what Ollama implemented. It may be possible to set a flag as you specified and conditionally call the regular completions api (add it to generation_kwargs and extract it if present), but I don't believe it should be default behavior since most users are probably not using api_base_url. I'll leave the rest to the Haystack team to see how they wish to proceed. |
cc @julian-risch to assign in the next sprint |
@julian-risch I've read this issue report in detail and understand what @Permafacture is asking for but in the light of our plan to deprecate all generators I wonder how relevant would work on such an issue be! I recommend closing with "Won't fix". |
So far it's only an idea. We have not decided yet whether to change anything about the generators. I'll move the issue to "Hold" in the mean time. |
Describe the bug
I'm using the OpenAIGenerator to access a vLLM endpoint on runpod. When using a base model like Mistral v0.3 that has not been instruction tuned and so does not have a chat template in it's config for the tokenizer, I get an error returned from the api endpoint. Digging into this I see that the OpenAIGenerator uses the chat_completion/ endpoint for the OpenAIGenerator and not the completion/ endpoint. This means I've been unintentionally using a chat template with other models up to this point.
Error message
"Cannot use apply_chat_template() because tokenizer.chat_template is not set and no template argument was passed!"
Expected behavior
I expected for the completions/ api endpoint to be used and the hugging face model to not try to use
apply_chat_template()
Additional context
I tried to use the client.completions method directly as a work around.
completion = generator.client.completions.create(model=generator.model, prompt="And then, something unexpected happened.", **generator.generation_kwargs)
The process on the server crashes with a
'NoneType' object has no attribute 'headers'
.System:
The text was updated successfully, but these errors were encountered: