Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

xmfbit · 2024-11-22T08:59:14Z

Hello author, after delving into your implementation code, I found that the quantize method in GEMM_W4A4 does not align with what is presented in the paper. I used smoothed(x) @ lora_down and the un-smoothed version x @ lora_down, and the results differ from qact.lora_act. Could you please explain this?

Thank you.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

xmfbit commented Nov 22, 2024

Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

Implementation dtails: GEMM_W4A4::quantize behavior different from the paper #33

Comments

xmfbit commented Nov 22, 2024