You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the given template code, there is no B1 in signature of kernel jit function flashatt_kernel, which seems to mismatch the problem description? Should i add another parameter to the function?
Uses zero programs. Block size B0 represents k of length N0.
Block size B0 represents q of length N0. Block size B0 represents v of length N0.
Sequence length is T. Process it B1 < T elements at a time.
$$z_{i} = \sum_{j} \text{softmax}(q_1 k_1, \ldots, q_T k_T)j v{j} \text{ for } i = 1\ldots N_0$$
This can be done in 1 loop using a similar trick from the last puzzle.
The text was updated successfully, but these errors were encountered:
Btw, B={"B0":200} seems to be problematic too, for we usually use tl.arange(0, B0) in the kernel to calculate offsets and tl.arange only accepts ranges of powers of 2.
In the given template code, there is no
B1
in signature of kernel jit functionflashatt_kernel
, which seems to mismatch the problem description? Should i add another parameter to the function?the problem setting states as follows:
The text was updated successfully, but these errors were encountered: