[Kernel] Use `out` in flash_attn_varlen_func #10811

WoosukKwon · 2024-12-01T23:31:07Z

This PR uses the in-place out argument of flash_attn_varlen_func to avoid redundant copy. This is possible by the change in vllm-project/flash-attention#32

Signed-off-by: Woosuk Kwon <[email protected]>

github-actions · 2024-12-01T23:31:19Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Woosuk Kwon <[email protected]>

vllm/v1/attention/backends/flash_attn.py

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon · 2024-12-02T02:13:47Z

Just for a note: This PR reduces the small-input latency of opt-125m from 194 ms to 184 ms

Avg latency: 0.18358359679890177 seconds
10% percentile latency: 0.18246181161375716 seconds
25% percentile latency: 0.18272388676996343 seconds
50% percentile latency: 0.18310453748563305 seconds
75% percentile latency: 0.18387063493719324 seconds
90% percentile latency: 0.18499673358164728 seconds
99% percentile latency: 0.18895185411558488 seconds

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Andrew Feldman <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

[Kernel] Use in flash_attn_varlen_func

30aa49f

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon requested review from tlrmchlsmth, robertgshaw2-neuralmagic, njhill, ywang96, comaniac and alexm-neuralmagic as code owners December 1, 2024 23:31

mergify bot added the ci/build label Dec 1, 2024

yapf

4b29cd2

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 1, 2024

ywang96 approved these changes Dec 2, 2024

View reviewed changes

vllm/v1/attention/backends/flash_attn.py Show resolved Hide resolved

update

b0b7a1b

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon merged commit 073a4bd into main Dec 2, 2024
14 of 18 checks passed

WoosukKwon deleted the flash-attn-out branch December 2, 2024 01:55

WoosukKwon mentioned this pull request Dec 2, 2024

[misc] use out argument for flash attention #9740

Closed

afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Dec 2, 2024

[Kernel] Use out arg in flash_attn_varlen_func (vllm-project#10811)

ab21a28

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Andrew Feldman <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Kernel] Use out arg in flash_attn_varlen_func (vllm-project#10811)

bd98290

Signed-off-by: Woosuk Kwon <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Kernel] Use out arg in flash_attn_varlen_func (vllm-project#10811)

3887354

Signed-off-by: Woosuk Kwon <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Use `out` in flash_attn_varlen_func #10811

[Kernel] Use `out` in flash_attn_varlen_func #10811

WoosukKwon commented Dec 1, 2024 •

edited

Loading

github-actions bot commented Dec 1, 2024

WoosukKwon commented Dec 2, 2024

[Kernel] Use out in flash_attn_varlen_func #10811

[Kernel] Use out in flash_attn_varlen_func #10811

Conversation

WoosukKwon commented Dec 1, 2024 • edited Loading

github-actions bot commented Dec 1, 2024

WoosukKwon commented Dec 2, 2024

[Kernel] Use `out` in flash_attn_varlen_func #10811

[Kernel] Use `out` in flash_attn_varlen_func #10811

WoosukKwon commented Dec 1, 2024 •

edited

Loading