Merge pull request #107 from lm-sys/routellm-updates

Update RouteLLM demo
lm-sys · Jul 1, 2024 · c52865d · c52865d
2 parents 805c10d + 6451cbd
commit c52865d
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/blog/2024-07-01-routellm.md b/blog/2024-07-01-routellm.md
@@ -15,6 +15,10 @@ LLM routing offers a solution to this, where each query is first processed by a
 
 To tackle this, we present **RouteLLM**, a principled framework for LLM routing based on preference data. We formalize the problem of LLM routing and explore augmentation techniques to improve router performance. We trained four different routers using public data from Chatbot Arena and demonstrate that they can significantly reduce costs without compromising quality, with **cost reductions of over 85% on MT Bench, 45% on MMLU, and 35% on GSM8K** as compared to using only GPT-4, while still achieving 95% of GPT-4’s performance. We also publicly release all our code and datasets, including a new [open-source framework](https://github.com/lm-sys/RouteLLM) for serving and evaluating LLM routers.
 
+## Demo
+
+We have built a temporary [demo](https://816388d8af31950a69.gradio.live) where you can experiment with our matrix factorization and causal LLM routers by seeing which model your messages are routed to. Both routers have been calibrated so that approximately 50% of calls are routed to GPT-4. Please try it out!
+
 ## Routing Setup
 
 In our routing setup, we focus on the case where there are two models: a stronger, more expensive model, and a weaker but cheaper model. Given this setup, our objective is to minimize costs while achieving high quality by routing between both models.
@@ -90,10 +94,6 @@ Based on this research, we have created an open-source framework for serving and
 
 We are excited to see what you build on top of this! Please let us know if you face any issues or have any suggestions. For the full details, please refer to our [arXiv](https://arxiv.org/abs/2406.18665) paper.
 
-## Demo
-
-We have built a temporary [demo](https://0c83f754b05f4a2208.gradio.live) where you can experiment with our augmented matrix factorization and causal LLM routers by seeing which model your messages are routed to. Both routers have been calibrated so that approximately 20% of calls are routed to GPT-4. Please try them out!
-
 ## Acknowledgements
 
 We are grateful to Tyler Griggs for his valuable feedback on this post.