You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use a mask that the topk from each attention row (A = Q @ K^T)
and calculate the full attention based on this.
Is it possible to calculate it using "score" within score_mod function. (it seems that this is scalar... :-( )
If not, is it possible to put in my masked_score, and exploit flex attention only for the reminder of calculation, namely the Softmax(masked_score) @ V
lastly, I wonder if i can use binary mask tensor with dims [B, H, T, T] as the binary mask.
thanks!
The text was updated successfully, but these errors were encountered:
hi
Practically, there are some approximators for to the mask (i research for dynamic not fixed); so i wanted to combine these methods efficiently with the flex pipeline.
so given inputs of q,k,v , suppose i have mask=func_approx(q,k), then I thought to use it with flex.
There are more useful scenarios to research ...
If i have also the masked_score, can i have benefits with flex from the reminder of calculation.
according to flash-attention study, the score calculation itself (even full) is quite cheap, relative to the whole attention calculation.
so if i take even the full score, just for start, and heavily prune it. What is the benefit that i can have with the reminder of calculation.
thanks!
hi,
I would like to use a mask that the topk from each attention row (A = Q @ K^T)
and calculate the full attention based on this.
thanks!
The text was updated successfully, but these errors were encountered: