Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream=true requests cause "Object of type Stream is not JSON serializable" error #117

Open
doublefx opened this issue Dec 28, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@doublefx
Copy link

doublefx commented Dec 28, 2024

When sending a stream=true request to optiLLM, the service encounters the following error:

{"error":"Object of type Stream is not JSON serializable"}

This error suggests that optiLLM is not properly handling streamed responses from liteLLM or OpenAI GPT-4o. Instead of processing the stream incrementally, it attempts to serialize the raw stream object directly into JSON, which causes the serialization failure.

Steps to Reproduce:
Send a POST request to optiLLM with the following payload:

{
    "model": "gpt-4o",
    "messages": [
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "Write a Python program to build an RL model using only numpy." }
    ],
    "max_tokens": 1000,
    "stream": true
}

Observe the error response:
{"error":"Object of type Stream is not JSON serializable"}

Expected Behavior:
The optiLLM service should handle streamed responses by:

  • Iterating through the stream of chunks from liteLLM or OpenAI.
  • Processing each chunk incrementally and forwarding it to the next layer (e.g., AnythingLLM).

Additional Context:

  • The following pipeline works properly: User -> AnythingLLM -> liteLLM -> openai/gpt-4o
  • The problem seems specific to how optiLLM handles the streamed response from liteLLM.

Severity:
High - This issue blocks the usage of stream=true functionality, which is critical for incremental responses in real-time applications.

@doublefx doublefx changed the title stream=true requests cause "Object of type Stream is not JSON serializable" error in optimLLM stream=true requests cause "Object of type Stream is not JSON serializable" error in optiLLM Dec 28, 2024
@doublefx doublefx changed the title stream=true requests cause "Object of type Stream is not JSON serializable" error in optiLLM stream=true requests cause "Object of type Stream is not JSON serializable" error Dec 28, 2024
@codelion
Copy link
Owner

Most of the approaches require the full output and multiple calls to the LLM so we cannot stream the responses to the next layer as they come. We could handle it in optillm by waiting for the full stream to finish but the effect would be similar to using the underlying LLM without streaming.

@av
Copy link

av commented Dec 29, 2024

Also encountered this while integrating OptiLLM.

A good middle ground for the LLM proxies is to stream what's possible. Every approach will have some portions that can be send back to the client for either traceability or as additional data even before the final response. In another proxy (don't want to link it) we called that "Intermediate outputs" and it can be toggled on/off based on the user preference.

However, this specific problem with OptiLLM break its compatibility with downstream services, for example - Open WebUI, which enables streaming by default. If a full streaming support is not planned (understandably, it's a big undertake) - a reasonable workaround is to imitate streaming interface and simply send the whole response in a single chunk when the workflow is finished.

@codelion
Copy link
Owner

codelion commented Dec 29, 2024

However, this specific problem with OptiLLM break its compatibility with downstream services, for example - Open WebUI, which enables streaming by default. If a full streaming support is not planned (understandably, it's a big undertake) - a reasonable workaround is to imitate streaming interface and simply send the whole response in a single chunk when the workflow is finished.

This is already done, the request here is to enable streaming inputs from the inference server. I can add similar workaround as well to the inputs from inference server so as to not break anything. Good suggestion.

@codelion codelion added the bug Something isn't working label Dec 29, 2024
@av
Copy link

av commented Dec 29, 2024

Yes, found the commit now, likely the streaming workaround is not fully compatible with Open WebUI due to some reason, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants