You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've worked on integrating it to Harbor and come across a couple of nice-to-haves that might make project friendlier under specific conditions. These are mostly specific to the Open Webui <-> OptiLLM <-> Ollama scenario.
Multiple downstream servers
It's very convenient to be able to run a single instance of the proxy for multiple downstream services. For example when running vLLM and llama.cpp together or using multiple nodes with different configuration to run different sizes of the models, or just when you want to combine local and cloud LLMs in a single workflow. In terms of the model ID collision - it's safe to let that for manual resolution when it happens and use a "last defined wins" (or another similarly simple) heuristic. Here's an example of this exact behavior implemented in Harbor Boost
Model prefix
Allowing to specify a custom prefix/postfix for the model IDs to easily distinguish OptiLLM models from other servers. I know that the model prefixes are also used for dynamic approach selection, but that's never exposed from the /v1/models endpoint. Also, tools like Open WebUI support unofficial extension of the model objects with the name field, which will be rendered in the model selector.
These are only suggestions to consider, thanks again for your work 🙌🏻
The text was updated successfully, but these errors were encountered:
Hi 👋🏻
Thanks for your work on OptiLLM!
I've worked on integrating it to Harbor and come across a couple of nice-to-haves that might make project friendlier under specific conditions. These are mostly specific to the Open Webui <-> OptiLLM <-> Ollama scenario.
Multiple downstream servers
It's very convenient to be able to run a single instance of the proxy for multiple downstream services. For example when running vLLM and llama.cpp together or using multiple nodes with different configuration to run different sizes of the models, or just when you want to combine local and cloud LLMs in a single workflow. In terms of the model ID collision - it's safe to let that for manual resolution when it happens and use a "last defined wins" (or another similarly simple) heuristic. Here's an example of this exact behavior implemented in Harbor Boost
Model prefix
Allowing to specify a custom prefix/postfix for the model IDs to easily distinguish OptiLLM models from other servers. I know that the model prefixes are also used for dynamic approach selection, but that's never exposed from the
/v1/models
endpoint. Also, tools like Open WebUI support unofficial extension of the model objects with thename
field, which will be rendered in the model selector.These are only suggestions to consider, thanks again for your work 🙌🏻
The text was updated successfully, but these errors were encountered: