Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

Open
xmfbit opened this issue Nov 22, 2024 · 0 comments

Comments

@xmfbit
Copy link

xmfbit commented Nov 22, 2024

Hello author, after delving into your implementation code, I found that the quantize method in GEMM_W4A4 does not align with what is presented in the paper. I used smoothed(x) @ lora_down and the un-smoothed version x @ lora_down, and the results differ from qact.lora_act. Could you please explain this?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant