You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suppose the first MEDUSA head generates the top-2 predictions "It is" and "It's", while the second MEDUSA head generates the top-3 predictions "difficult", "a", and "not". This results in a total of 2 × 3 = 6 candidates.
The tree-structured attention mechanism ensures that each token can only attend to its predecessors within the same continuation. For instance, the token "difficult" can only attend to "It is" or "It's", but not to "not" or "a", as they belong to different continuations.
So,
"difficult" can attend to "It is".
"difficult" is generated by MEDUSA head 2, and "It is" is generated by MEDUSA head 1.
head 2 and head 1 are running in parallel.
This means when head 2 is generating "difficult", "It is" has not necessarily already been generated by head 1. If "It is" has not been generated at that moment "difficult" is being generated, how can "difficult" attend to the not yet exist "It is"?
The text was updated successfully, but these errors were encountered:
Suppose the first MEDUSA head generates the top-2 predictions "It is" and "It's", while the second MEDUSA head generates the top-3 predictions "difficult", "a", and "not". This results in a total of 2 × 3 = 6 candidates.
The tree-structured attention mechanism ensures that each token can only attend to its predecessors within the same continuation. For instance, the token "difficult" can only attend to "It is" or "It's", but not to "not" or "a", as they belong to different continuations.
So,
This means when head 2 is generating "difficult", "It is" has not necessarily already been generated by head 1. If "It is" has not been generated at that moment "difficult" is being generated, how can "difficult" attend to the not yet exist "It is"?
The text was updated successfully, but these errors were encountered: