Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulties in the understanding of the constant A #7

Open
WYL-Projects opened this issue May 28, 2022 · 1 comment
Open

Difficulties in the understanding of the constant A #7

WYL-Projects opened this issue May 28, 2022 · 1 comment

Comments

@WYL-Projects
Copy link

Hello, Author
Thanks for the high performance pyramid attention you have proposed. However, while I was reviewing the paper I came across several difficulties as follows:

  1. In the appendix of the paper, I can know that the constant A represents the number of adjacent nodes at the same scale that a node can attend to.When the constant A takes the value 3, does it represent the number of neighbor nodes at the same scale as the middle node except for the leftmost and rightmost nodes?Because I think the number of neighboring nodes of the leftmost and rightmost nodes under each scale is 2, right?
  2. When the constant A takes the value of 3, the model diagram of PAM looks like this.
    image
    When the constant A takes the value 5, it means that the number of neighboring nodes of each node at the same scale is 5. The model diagram of PAM looks like the following?
    image
    If so, then the number of neighboring nodes A of the leftmost two and rightmost two nodes of the sequence data of S=1 can only be 3 and 4, and similarly the number of neighboring nodes A of the leftmost two and rightmost two nodes of S=2 can only be 3 and 4, and the leftmost node and rightmost node A of S=3 can only be 3. But I feel my understanding of the constant A is wrong.I hope the author can give me some pointers.
@Zhazhan
Copy link

Zhazhan commented May 31, 2022

Hello,

Thank you for your interest in our work. Good questions.

  1. Yes, we use A to represent the number of same-scale neighbor nodes that a middle node in the sequence (or, most nodes in the sequence) can attend to. The number of same-scale neighbor nodes that can be attended to by nodes near the leftmost and rightmost in the sequence is less than A. In equations (8) and (12), we take this into account and take the upper bound A to compute complexity.

  2. Yes, the diagram of A=5 is right. Your understanding is right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants