Skip to content

Commit

Permalink
Merge pull request #107 from lm-sys/routellm-updates
Browse files Browse the repository at this point in the history
Update RouteLLM demo
  • Loading branch information
iojw authored Jul 1, 2024
2 parents 805c10d + 6451cbd commit c52865d
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions blog/2024-07-01-routellm.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ LLM routing offers a solution to this, where each query is first processed by a

To tackle this, we present **RouteLLM**, a principled framework for LLM routing based on preference data. We formalize the problem of LLM routing and explore augmentation techniques to improve router performance. We trained four different routers using public data from Chatbot Arena and demonstrate that they can significantly reduce costs without compromising quality, with **cost reductions of over 85% on MT Bench, 45% on MMLU, and 35% on GSM8K** as compared to using only GPT-4, while still achieving 95% of GPT-4’s performance. We also publicly release all our code and datasets, including a new [open-source framework](https://github.com/lm-sys/RouteLLM) for serving and evaluating LLM routers.

## Demo

We have built a temporary [demo](https://816388d8af31950a69.gradio.live) where you can experiment with our matrix factorization and causal LLM routers by seeing which model your messages are routed to. Both routers have been calibrated so that approximately 50% of calls are routed to GPT-4. Please try it out!

## Routing Setup

In our routing setup, we focus on the case where there are two models: a stronger, more expensive model, and a weaker but cheaper model. Given this setup, our objective is to minimize costs while achieving high quality by routing between both models.
Expand Down Expand Up @@ -90,10 +94,6 @@ Based on this research, we have created an open-source framework for serving and

We are excited to see what you build on top of this! Please let us know if you face any issues or have any suggestions. For the full details, please refer to our [arXiv](https://arxiv.org/abs/2406.18665) paper.

## Demo

We have built a temporary [demo](https://0c83f754b05f4a2208.gradio.live) where you can experiment with our augmented matrix factorization and causal LLM routers by seeing which model your messages are routed to. Both routers have been calibrated so that approximately 20% of calls are routed to GPT-4. Please try them out!

## Acknowledgements

We are grateful to Tyler Griggs for his valuable feedback on this post.
Expand Down

0 comments on commit c52865d

Please sign in to comment.