-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong padding values of attn_weights in graph_mm_tvm #5
Comments
Thank you for pointing this out. We've changed the related padding in graph_mm_TVM.py and PAM_TVM.py to -1000000000. |
Hi, just found a small confusing issue about this padding value. The padding value is also used in the backward calculation, so some invalid positions in the gradient matrix will be filled in -inf. Not sure whether this will bring other problems. Mabe Longformer's implementation is more reasonable? Just after the calculation, replace the invalid attn weights with -inf. |
Well, the padding value is not saved for backward. The back propagation defaults to fill 0 for invalid positions. You can look at the arguments to 'GraphMM._graph_mm' in the forward and backward functions. |
Oh, I forgot to check this setting... Thanks for your quick reply! |
The padding values should be -inf for invalid connections, but here padding=0, which will have the wrong effect when calculating the softmax.
https://github.com/alipay/Pyraformer/blob/0dc4e2e438af54615b132b8b9b0cec3f14715d4f/pyraformer/graph_attention.py#L231-L232
When checking the Longformer codes, they used a d_mask to replace the padding with -inf.
https://github.com/allenai/longformer/blob/caefee668e39cacdece7dd603a0bebf24df6d8ca/longformer/longformer.py#L146-L174
The text was updated successfully, but these errors were encountered: