Skip to content

Latest commit

 

History

History
15 lines (12 loc) · 1.34 KB

llm_serving.md

File metadata and controls

15 lines (12 loc) · 1.34 KB

LLM Serving

2024

OSDI

  • Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve [paper]
  • ServerlessLLM: Low-Latency Serverless Inference for Large Language Models [paper]
  • InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management [paper]
  • Llumnix: Dynamic Scheduling for Large Language Model Serving [paper]
  • DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving [paper]
  • dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving [paper]
  • Parrot: Efficient Serving of LLM-based Applications with Semantic Variable [paper]
  • USHER: Holistic Interference Avoidance for Resource Optimized ML Inference [paper]
  • Fairness in Serving Large Language Models [paper]