[Kernel][LoRA]Punica prefill kernels fusion #11234

jeejeelee · 2024-12-16T15:28:33Z

Summary

This PR has accomplished the following tasks:

Performed horizontal fusion on the shrink and expand kernel of sgmv, enabling support for multiple (1-3) subLoRAs.
This optimization reduces Triton kernel calls in MergedColumnParallelLinearWithLoRA and MergedQKVParallelLinearWithLora, lowering TTFT, while also removing the unused sgmv_expand_slice kernel.
Made minor adjustments to the configuration of the fused expand kernel, resulting in slight performance improvements.
Updated the corresponding unit tests for the fused kernels

Performance comparison

Models expected to benefit from these changes

serve script

vllm serve meta-llama/Llama-2-7b-hf \ 
--gpu-memory-utilization 0.90 \ 
--served-model-name llama2-7b \ 
--enable-lora --max-loras 3 --max-cpu-loras 15 --max-lora-rank 8 \
--lora-modules lora=yard1/llama-2-7b-sql-lora-test \

Models that should not be affected

serve script

vllm serve THUDM/chatglm3-6b \ 
--gpu-memory-utilization 0.90 \ 
--served-model-name chatglm3 \ 
--enable-lora --max-loras 3 --max-cpu-loras 15 --max-lora-rank 64 \
--lora-modules lora=jeeejeee/chatglm3-text2sql-spider \
--trust-remote-code

Signed-off-by: Jee Jee Li <[email protected]>

github-actions · 2024-12-16T15:28:45Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Jee Jee Li <[email protected]>

varun-sundar-rabindranath

LGTM !

Signed-off-by: Jee Jee Li <[email protected]>

mgoin

Nice work, no additional comments from me!

Isotr0py

LGTM!

Signed-off-by: Jee Jee Li <[email protected]>

jeejeelee · 2025-01-04T06:52:20Z

Add minicpmv lora tests to distributed testing to avoid OOM, and remove samples with unstable generation results in TP testing (which have unstable generation results even in the current main branch)

jeejeelee added 18 commits December 10, 2024 01:49

Init

ec3590d

Signed-off-by: Jee Jee Li <[email protected]>

Sync main

9474fb0

Signed-off-by: Jee Jee Li <[email protected]>

Fix bug

8c2ac4c

Signed-off-by: Jee Jee Li <[email protected]>

Merge branch 'vllm-project:main' into punica-kernel-fusion

2897d05

Merge branch 'vllm-project:main' into punica-kernel-fusion

35aebea

Merge branch 'vllm-project:main' into punica-kernel-fusion

628a567

Back up

d04121c

Signed-off-by: Jee Jee Li <[email protected]>

shrink_sgmv Done

a306f42

Signed-off-by: Jee Jee Li <[email protected]>

Merge branch 'vllm-project:main' into punica-kernel-fusion

f6bccc7

Merge branch 'vllm-project:main' into punica-kernel-fusion

e5cb72e

Optimize ptr compute

b6013db

Signed-off-by: Jee Jee Li <[email protected]>

Merge commit 'b6013db4' into punica-kernel-fusion

7f088ec

Merge branch 'vllm-project:main' into punica-kernel-fusion

32c5279

Increase the tile size

8d3742b

Signed-off-by: Jee Jee Li <[email protected]>

Clean up triton interface

9564b33

Signed-off-by: Jee Jee Li <[email protected]>

Sync main

3eb3ac3

Signed-off-by: Jee Jee Li <[email protected]>

Backup

4012466

Signed-off-by: Jee Jee Li <[email protected]>

Optimize one sclice kernel

18bbadf

Signed-off-by: Jee Jee Li <[email protected]>

jeejeelee marked this pull request as draft December 16, 2024 15:28

jeejeelee added 10 commits December 16, 2024 15:38

Delete unused code

43aae70

Signed-off-by: Jee Jee Li <[email protected]>

Refactor expand

482de15

Signed-off-by: Jee Jee Li <[email protected]>

format

259d382

Signed-off-by: Jee Jee Li <[email protected]>

Merge branch 'vllm-project:main' into punica-kernel-fusion

00f1904

Optimize logic

a0197e3

Signed-off-by: Jee Jee Li <[email protected]>

Add comments

38ba4f1

Signed-off-by: Jee Jee Li <[email protected]>

Fix bug

3c37226

Signed-off-by: Jee Jee Li <[email protected]>

Fix expand bug

45180c1

Signed-off-by: Jee Jee Li <[email protected]>

Backup

2e52d2c

Signed-off-by: Jee Jee Li <[email protected]>

revert expand tile size

2146141

Signed-off-by: Jee Jee Li <[email protected]>

jeejeelee added 3 commits December 30, 2024 13:56

Merge branch 'vllm-project:main' into punica-kernel-fusion

ea19a7d

Optimize arg

65d0f2f

Signed-off-by: Jee Jee Li <[email protected]>

Merge branch 'vllm-project:main' into punica-kernel-fusion

797ae77

varun-sundar-rabindranath approved these changes Dec 30, 2024

View reviewed changes

jeejeelee added 5 commits December 31, 2024 09:21

Merge branch 'vllm-project:main' into punica-kernel-fusion

2b9f928

Merge branch 'vllm-project:main' into punica-kernel-fusion

09fb9a9

Merge branch 'vllm-project:main' into punica-kernel-fusion

f446454

Merge branch 'vllm-project:main' into punica-kernel-fusion

767b233

Fix expand bug

421382e

Signed-off-by: Jee Jee Li <[email protected]>

mgoin approved these changes Jan 3, 2025

View reviewed changes

jeejeelee requested review from DarkLight1337 and Isotr0py January 3, 2025 01:48

Merge branch 'vllm-project:main' into punica-kernel-fusion

90a9117

Isotr0py approved these changes Jan 3, 2025

View reviewed changes

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 3, 2025

Reduce memory

2c79295

Signed-off-by: Jee Jee Li <[email protected]>

jeejeelee removed ready ONLY add when PR is ready to merge/full CI is needed ci/build labels Jan 4, 2025

Modify minicpmv test

7e8d3bd

Signed-off-by: Jee Jee Li <[email protected]>

jeejeelee force-pushed the punica-kernel-fusion branch from 18bfb02 to 7e8d3bd Compare January 4, 2025 06:48

mergify bot added the ci/build label Jan 4, 2025

Merge branch 'vllm-project:main' into punica-kernel-fusion

02b1d80

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 4, 2025

jeejeelee added 3 commits January 5, 2025 13:32

Merge branch 'vllm-project:main' into punica-kernel-fusion

bd8cc45

Merge branch 'vllm-project:main' into punica-kernel-fusion

7ffd15e

Merge branch 'vllm-project:main' into punica-kernel-fusion

c1c5b4b

DarkLight1337 enabled auto-merge (squash) January 6, 2025 13:16

jeejeelee mentioned this pull request Jan 6, 2025

[Hardware][CPU] Multi-LoRA implementation for the CPU backend #11100

Open

DarkLight1337 merged commit b278557 into vllm-project:main Jan 7, 2025
77 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel][LoRA]Punica prefill kernels fusion #11234

[Kernel][LoRA]Punica prefill kernels fusion #11234

jeejeelee commented Dec 16, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 16, 2024

varun-sundar-rabindranath left a comment

mgoin left a comment

Isotr0py left a comment

jeejeelee commented Jan 4, 2025

[Kernel][LoRA]Punica prefill kernels fusion #11234

[Kernel][LoRA]Punica prefill kernels fusion #11234

Conversation

jeejeelee commented Dec 16, 2024 • edited by github-actions bot Loading

Summary

Performance comparison

Models expected to benefit from these changes

Models that should not be affected

github-actions bot commented Dec 16, 2024

varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

Isotr0py left a comment

Choose a reason for hiding this comment

jeejeelee commented Jan 4, 2025

jeejeelee commented Dec 16, 2024 •

edited by github-actions bot

Loading