Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Refactoring disaggregated prefilling/decoding using Mooncake Transfer Engine #10728

Closed

Conversation

alogfans
Copy link

@alogfans alogfans commented Nov 28, 2024

This PR is related to #10727, as well a continuation of PR #8498, which uses Mooncake's Transfer Engine for KVCache transfer instead of NCCL.

Mooncake is a KVCache-centric disaggregated architecture for LLM serving. Transfer Engine is the core component of Mooncake, see documentations for its design & API list.

Compared with NCCL, Mooncake Transfer Engine has the following features:

  • a unified programming interface for data transfers between DRAM-to-DRAM (both local and remote), DRAM-to-GPU VRAM (both local and remote), and DRAM-to-remote NVMe devices
  • support for TCP, RDMA, and NVMe-of protocols
  • topology-aware path selection (link to our english doc, transfer_engine.md), aggregating bandwidth from multiple NICs

Like the current implementation of PR #8498, there are two roles: KV provider (e.g. prefill vLLM instance) and KV consumer (e.g. decode vLLM instance)

  • Provider side implements insert: insert a KV cache into a buffer, so that it can be transferred upon request
  • Consumer side implements drop_select: select a KV cache based on tokens, transfer the selected KV, and drop this KV out from the buffer

Both roles are run in different machines.

Integration guide: https://github.com/kvcache-ai/mooncake/blob/main/doc/en/vllm-integration.md

Benchmark result: https://github.com/kvcache-ai/mooncake/blob/main/doc/en/vllm_benchmark_results.md

Copy link

mergify bot commented Nov 29, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alogfans.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 29, 2024
@stmatengss stmatengss force-pushed the mooncake-integration-patch branch from 7132c75 to c1477fb Compare November 29, 2024 03:43
@mergify mergify bot removed the needs-rebase label Nov 29, 2024
@stmatengss
Copy link

Currently, this PR is based on the early version of #8498. We plan to clean up and rebase the code against the latest version soon. Apologies for triggering the request review prematurely.

Copy link

mergify bot commented Dec 2, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alogfans.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 2, 2024
@KuntaiDu
Copy link
Collaborator

KuntaiDu commented Dec 2, 2024

The new version of disaggregated prefill PR #10502 is just merged, and feel free to continue development in vLLM's main branch! API-wise the new PR is pretty similar to the old PR so (hopefully) it is straightforward to migrate the implementation.

@junna2016
Copy link

junna2016 commented Dec 3, 2024

Can you provide a test example to run disaggregated prefill/decoding mode with MooncakeDistributedPipe scene?

@ShangmingCai
Copy link
Contributor

Can you provide a test example to run disaggregated prefill/decoding mode with MooncakeDistributedPipe scene?

You can refer to this doc to run a demo based on PR 8498. Currently, we are rebasing from the main branch. It is nearly done, but we will run more tests to ensure its compatibility.

@junna2016
Copy link

Can you provide a test example to run disaggregated prefill/decoding mode with MooncakeDistributedPipe scene?

You can refer to this doc to run a demo based on PR 8498. Currently, we are rebasing from the main branch. It is nearly done, but we will run more tests to ensure its compatibility.

Thanks a lot

@ShangmingCai
Copy link
Contributor

After rebase, we move the development to PR #10884 now.

@DarkLight1337
Copy link
Member

Closing as superseded by #10884

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend needs-rebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants