-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix parameter names and process_after_weight_loading
for W4A16 MoE Group Act Order
#11528
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
I can confirm it works with tp>1 |
@mgoin |
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Show resolved
Hide resolved
This pull request has merge conflicts that must be resolved before it can be |
c7a912e
to
4c71143
Compare
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
# Will transpose the loaded weight along the | ||
# intermediate and hidden dim sizes. Will | ||
# shard for TP along the transposed dims | ||
intermediate_full = extra_weight_attrs.pop("intermediate_full") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename intermediate_size
-> intermediate_size_per_partition
and intermediate_full
-> intermediate_size
?
This would make the names consistent with other quant configs, e.g. vllm/model_executor/layers/quantization/gptq.py
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
Signed-off-by: ElizaWszola <[email protected]>
…pressed_tensors_moe.py Co-authored-by: Michael Goin <[email protected]>
a68004b
to
c2bce52
Compare
Summary
process_after_weight_loading
if running group act orderTesting
Next Steps: