Developers are increasingly creating Compound AI systems that combine multiple model calls and external components to tackle complex AI tasks. These systems often outperform single models through effective component combination and orchestration with better cost-efficiency and/or reduced latency. This repository collects and categorizes papers on Compound AI, including LLM routing, cascading, ensembling, speculative decoding methods, and LLM programming frameworks.
If you find this repository useful, please consider giving it a star. If there are any relevant papers that should be included, you're welcome to create a pull request or open an issue!
- The Shift from Models to Compound AI Systems
- Compound AI Systems Workshop
- Compound AI Systems - Databricks
- Large Language Model Routing with Benchmark Datasets (arXiv, 2023) [PDF]
- Tal Shnitzer, Anthony Ou, Mírian Silva, Kate Soule, Yuekai Sun, Justin Solomon, Neil Thompson, Mikhail Yurochkin
- Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models (arXiv, 2023) [PDF]
- Surya Narayanan Hari, Matt Thomson
- Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing (Workshop on Insights from Negative Results in NLP, 2024) [PDF]
- Kv Aditya Srivatsa, Kaushal Maurya, Ekaterina Kochmar
- Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling (WSDM 2024) [PDF]
- Marija Šakota, Maxime Peyrard, Robert West
- Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models (NAACL 2024) [PDF]
- Keming Lu, Hongyi Yuan, Runji Lin, Junyang Lin, Zheng Yuan, Chang Zhou, Jingren Zhou
- Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits (WWW 2024) [PDF]
- Yu Xia, Fang Kong, Tong Yu, Liya Guo, Ryan A. Rossi, Sungchul Kim, Shuai Li
- Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing (ICLR 2024) [PDF]
- Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Rühle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah
- Towards Optimizing the Costs of LLM Usage (arXiv, 2024) [PDF]
- Shivanshu Shekhar, Tanishq Dubey, Koyel Mukherjee, Apoorv Saxena, Atharv Tyagi, Nishanth Kotla
- ROUTERBENCH: A Benchmark for Multi-LLM Routing System (arXiv, 2024) [PDF]
- Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay
- Code: https://github.com/withmartian/routerbench
- RouteLLM: Learning to Route LLMs with Preference Data (arXiv, 2024) [PDF]
- Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica
- Efficient Online ML API Selection for Multi-Label Classification Tasks (ICML 2022) [PDF]
- Lingjiao Chen, Matei Zaharia, James Zou
- LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion (ACL 2023) [PDF]
- Dongfu Jiang, Xiang Ren, Bill Yuchen Lin
- More Agents Is All You Need (arXiv, 2024) [PDF]
- Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye
- FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply (NIPS 2020) [PDF]
- Lingjiao Chen, Matei Zaharia, James Zou
- Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems (EMNLP 2022) [PDF]
- Neeraj Varshney, Chitta Baral
- FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance (arXiv, 2023) [PDF]
- Lingjiao Chen, Matei Zaharia, James Zou
- Online Cascade Learning for Efficient Inference over Streams (ICML 2024) [PDF]
- Lunyiu Nie, Zhimin Ding, Erdong Hu, Christopher Jermaine, Swarat Chaudhuri
- Code: https://github.com/Flitternie/online_cascade_learning
- Language Model Cascades: Token-level uncertainty and beyond (ICLR 2024) [PDF]
- Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
- Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning (ICLR 2024) [PDF]
- Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao
- Cascade-Aware Training of Language Models (arXiv, 2024) [PDF]
- Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go
- Fast Inference from Transformers via Speculative Decoding (ICML 2023) [PDF]
- Yaniv Leviathan, Matan Kalman, Yossi Matias
- SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification (ASPLOS 2024) [PDF]
- Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia
- Faster Cascades via Speculative Decoding (arXiv, 2024) [PDF]
- Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar
- Prompting Is Programming: A Query Language for Large Language Models (PLDI 2023) [PDF]
- Luca Beurer-Kellner, Marc Fischer, Martin Vechev
- Code: https://github.com/eth-sri/lmql
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework (ICLR 2024 Workshop on LLM Agents) [PDF]
- Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang
- Code: https://github.com/microsoft/autogen
- DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (R0-FoMo Workshop, 2023) [PDF]
- Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
- Code: https://github.com/stanfordnlp/dspy
- Language Agents as Optimizable Graphs (ICML 2024) [PDF]
- Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber
- Code: https://github.com/metauto-ai/gptswarm
- SGLang: Efficient Execution of Structured Language Model Programs (arXiv, 2024) [PDF]
- Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng
- Code: https://github.com/sgl-project/sglang
- AgentLego: An open-source library of versatile tool APIs to extend and enhance LLM based agents
- OpenXLab
- Code: https://github.com/InternLM/agentlego
- A Declarative System for Optimizing AI Workloads (arXiv, 2024) [PDF]
- Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Gerardo Vitagliano
- Code: https://github.com/mitdbg/palimpzest