You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Launch multiple TMA operations simultaneously but process each stage as they become available.
Motivation
Overlap data movement with computation
Pseudo-code
for each stage of producer TV:
launch TMA operation for stage
end for
for each stage of consumer:
wait for corresponding TMA stage to become available
end for
rdspring1
changed the title
Double-Buffer TMA Support for Pointwise and Normalization Kernels
Add Double-Buffer TMA Support for Pointwise and Normalization Kernels
May 3, 2024
rdspring1
changed the title
Add Double-Buffer TMA Support for Pointwise and Normalization Kernels
Add Circular-Buffer TMA Support for Pointwise and Normalization Kernels
Jul 18, 2024
Pipelining - (Multiple mbarriers per TensorView)
Launch multiple TMA operations simultaneously but process each stage as they become available.
Motivation
Overlap data movement with computation
Pseudo-code
Example: Synchronous TMA
The text was updated successfully, but these errors were encountered: