-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][Hardware][AMD] Enable level 3 compilation on rocm #10836
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
33e7055
to
f441c65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you consider adding AMD to this fusion test case?
vllm/tests/compile/test_fusion.py
Lines 45 to 47 in 9b14d97
@pytest.mark.skipif(envs.VLLM_TARGET_DEVICE != "cuda", | |
reason="Only test on CUDA") | |
def test_fusion_rmsnorm_quant(dtype, hidden_size, num_tokens, eps): |
@ProExpertProg @mgoin Some updates: I was trying to enable the unit test of the fusion pass on rocm. I found that with May I ask it is ok to disable the padding by default? Link |
I am looking into this and found the same issue. There's also another problem, unrelated to fusion. When we compile with dynamic shape, the That means if we compile with a dynamic @mgoin I'd advocate for removing the padding in the short term - how often do we deal with |
Minor correction: the |
Could we simply remove the padding in the non-CUDA case? The reason why it is there is because of bad scaled_mm performance on CUDA |
Ok if we want to keep the padding I think I have a solution for the fusion (but not for dynamic shape compilation). I can implement it tomorrow |
For what it's worth, |
Signed-off-by: charlifu <[email protected]>
Signed-off-by: charlifu <[email protected]>
Signed-off-by: charlifu <[email protected]>
Signed-off-by: charlifu <[email protected]>
Signed-off-by: charlifu <[email protected]>
129797d
to
e5d9c7a
Compare
This PR fixs the fusion pass not enabled on rocm by:
torch.float8funz
torch.narrow
op which creates extra ops in the IR generated bytorch.compile
and makes the rms+fp8_quant fusion [torch.compile] Fuse RMSNorm with quant #9138 not working.