Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel][LoRA]Punica prefill kernels fusion #11234

Merged
merged 70 commits into from
Jan 7, 2025

Conversation

jeejeelee
Copy link
Collaborator

@jeejeelee jeejeelee commented Dec 16, 2024

Summary

This PR has accomplished the following tasks:

  • Performed horizontal fusion on the shrink and expand kernel of sgmv, enabling support for multiple (1-3) subLoRAs.
    This optimization reduces Triton kernel calls in MergedColumnParallelLinearWithLoRA and MergedQKVParallelLinearWithLora, lowering TTFT, while also removing the unused sgmv_expand_slice kernel.
  • Made minor adjustments to the configuration of the fused expand kernel, resulting in slight performance improvements.
  • Updated the corresponding unit tests for the fused kernels

Performance comparison

Models expected to benefit from these changes

  • serve script
vllm serve meta-llama/Llama-2-7b-hf \ 
--gpu-memory-utilization 0.90 \ 
--served-model-name llama2-7b \ 
--enable-lora --max-loras 3 --max-cpu-loras 15 --max-lora-rank 8 \
--lora-modules lora=yard1/llama-2-7b-sql-lora-test \

image

Models that should not be affected

  • serve script
vllm serve THUDM/chatglm3-6b \ 
--gpu-memory-utilization 0.90 \ 
--served-model-name chatglm3 \ 
--enable-lora --max-loras 3 --max-cpu-loras 15 --max-lora-rank 64 \
--lora-modules lora=jeeejeee/chatglm3-text2sql-spider \
--trust-remote-code

image

@jeejeelee jeejeelee marked this pull request as draft December 16, 2024 15:28
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Copy link
Contributor

@varun-sundar-rabindranath varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, no additional comments from me!

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 3, 2025
Signed-off-by: Jee Jee Li <[email protected]>
@jeejeelee jeejeelee removed ready ONLY add when PR is ready to merge/full CI is needed ci/build labels Jan 4, 2025
Signed-off-by: Jee Jee Li <[email protected]>
@jeejeelee jeejeelee force-pushed the punica-kernel-fusion branch from 18bfb02 to 7e8d3bd Compare January 4, 2025 06:48
@mergify mergify bot added the ci/build label Jan 4, 2025
@jeejeelee
Copy link
Collaborator Author

Add minicpmv lora tests to distributed testing to avoid OOM, and remove samples with unstable generation results in TP testing (which have unstable generation results even in the current main branch)

@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 4, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 6, 2025 13:16
@DarkLight1337 DarkLight1337 merged commit b278557 into vllm-project:main Jan 7, 2025
77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants