[OPT] improve rms_norm kernel #258

kkHuang-amd · 2024-11-01T06:43:37Z

Use vectorized memory access to improve rms_norm performance.

In the ROCm platform, we can do the 16 bytes memory access in one instruction, by using this instruction, we can do vectorized memory access for rms_nrom kernel.

From the testing, we can see rms_kernel has improvement

gshtras · 2024-11-01T14:50:43Z

@kkHuang-amd Could you please fill in a brief description of what this is doing, which improvement is expected and why

HaiShaw

LGTM

gshtras

Conditionally approving. My points of concern:

int32 for indices. Are we sure in this use case it won't cause overflows? cc @rasmith
Possible performance implications on platforms other than MI (Navi). cc @maleksan85

rasmith

Some places where integer overflow could occur.

csrc/layernorm_kernels.cu

HaiShaw · 2024-11-11T19:41:36Z

@rasmith Is this good now?

rasmith · 2024-11-11T19:44:12Z

@rasmith Is this good now?

Looks good!

The base branch was changed.

gshtras

Approved, pending conflict resolution

csrc/layernorm_kernels.cu

…BFloat16ELi8EEENSt9enable_ifIXooooeqT0_Li0Entsr12_typeConvertIT_EE6existseqLi2ELi2EEvE4typeEPS4_PKS4_S9_fiii"

HaiShaw

LGTM

gshtras · 2024-11-26T22:28:28Z

@kkHuang-amd this PR introduces a correctness regression on Llama-3.2-90B-Vision-Instruct-FP8-KV model without triton flash attention.
I'm planning to revert it pending further investigation

gshtras · 2024-11-26T23:34:54Z

The issue is the following:
There used to be 2 versions of the kernel - MI and Navi.
After this PR, on MI the Navi version is called if vec_size is not a multiple of 8, which seems to be the case in the vision model. And this causes a correctness regression

This reverts commit 15c78e7.

[OPT] improve rms_norm kernel

4f7ffd2

kkHuang-amd requested review from gshtras and shajrawi November 1, 2024 06:49

HaiShaw previously approved these changes Nov 1, 2024

View reviewed changes

gshtras previously approved these changes Nov 4, 2024

View reviewed changes

Merge branch 'main' into kk/rms_norm_opt

9fb3b2e

rasmith requested changes Nov 4, 2024

View reviewed changes

fix potential int overflow issue

22669a6

kkHuang-amd dismissed stale reviews from gshtras and HaiShaw via 22669a6 November 6, 2024 07:46

clang-formatted

2d02f02

kkHuang-amd requested a review from rasmith November 6, 2024 09:37

wunhuang and others added 2 commits November 7, 2024 01:21

Last one potential int overflow issue fix

1cde654

Merge branch 'main' into kk/rms_norm_opt

c0c3123

gshtras previously approved these changes Nov 7, 2024

View reviewed changes

rasmith previously approved these changes Nov 11, 2024

View reviewed changes

gshtras changed the base branch from main to develop November 12, 2024 16:47

gshtras previously approved these changes Nov 12, 2024

View reviewed changes

root added 2 commits November 13, 2024 05:33

Merge branch 'develop' into kk/rms_norm_opt

4f35f17

fix merge develop bb

a79f0a4

kkHuang-amd dismissed gshtras’s stale review via a79f0a4 November 13, 2024 06:02

kkHuang-amd requested review from rasmith, HaiShaw and gshtras November 13, 2024 07:22

gshtras reviewed Nov 13, 2024

View reviewed changes

csrc/layernorm_kernels.cu Show resolved Hide resolved

gshtras reviewed Nov 13, 2024

View reviewed changes

csrc/layernorm_kernels.cu Show resolved Hide resolved

kkHuang-amd and others added 2 commits November 18, 2024 15:31

Merge branch 'develop' into kk/rms_norm_opt

246fc1e

Resolve the code according to reviewer's suggesstion

4cbbb99

kkHuang-amd requested a review from gshtras November 18, 2024 08:53

solve "Cannot find Symbol with name: _ZN4vllm15rms_norm_kernelIN3c108…

c33c704

…BFloat16ELi8EEENSt9enable_ifIXooooeqT0_Li0Entsr12_typeConvertIT_EE6existseqLi2ELi2EEvE4typeEPS4_PKS4_S9_fiii"

HaiShaw approved these changes Nov 19, 2024

View reviewed changes

Merge branch 'develop' into kk/rms_norm_opt

68efe50

gshtras approved these changes Nov 19, 2024

View reviewed changes

Merge branch 'develop' into kk/rms_norm_opt

a74fb5e

gshtras merged commit 15c78e7 into develop Nov 20, 2024
6 of 7 checks passed

gshtras deleted the kk/rms_norm_opt branch November 20, 2024 15:40

gshtras added a commit that referenced this pull request Nov 26, 2024

Revert "[OPT] improve rms_norm kernel (#258)"

2fa3f79

This reverts commit 15c78e7.

gshtras mentioned this pull request Nov 26, 2024

Revert "[OPT] improve rms_norm kernel" #293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OPT] improve rms_norm kernel #258

[OPT] improve rms_norm kernel #258

kkHuang-amd commented Nov 1, 2024 •

edited by github-actions bot

Loading

gshtras commented Nov 1, 2024

HaiShaw left a comment

gshtras left a comment

rasmith left a comment

HaiShaw commented Nov 11, 2024

rasmith commented Nov 11, 2024

gshtras left a comment

HaiShaw left a comment

gshtras commented Nov 26, 2024

gshtras commented Nov 26, 2024

[OPT] improve rms_norm kernel #258

[OPT] improve rms_norm kernel #258

Conversation

kkHuang-amd commented Nov 1, 2024 • edited by github-actions bot Loading

gshtras commented Nov 1, 2024

HaiShaw left a comment

Choose a reason for hiding this comment

gshtras left a comment

Choose a reason for hiding this comment

rasmith left a comment

Choose a reason for hiding this comment

HaiShaw commented Nov 11, 2024

rasmith commented Nov 11, 2024

gshtras left a comment

Choose a reason for hiding this comment

HaiShaw left a comment

Choose a reason for hiding this comment

gshtras commented Nov 26, 2024

gshtras commented Nov 26, 2024

kkHuang-amd commented Nov 1, 2024 •

edited by github-actions bot

Loading