You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the code snippet below, where we use expand to change a broadcast dimension to non-broadcast, I'm seeing our scheduler forced to segment and generating two kernels.
But if I switch the expand into a pad, we can actually handle it in a single kernel.
Is this an expected behavior? I'm suspecting this is because expand doesn't have a dependency between the broadcast ID and the non-broadcast ID.
It's surprising to me that codegen can handle resize more flexibly than expand. Not sure about performance though.... I should go double check the actual kernel. I'm not sure how expand is lowered to codegen and whether having a single kernel does indeed win on kernel time.
For the pointwise scheduler, this should be segmented because there's no single tensor that has all the concrete IDs. Not saying scheduling this fusion as a single kernel is absolutely impossible, but because of how the current pointwise scheduler works, this should be segmented.
In the case of the padding, that's probably because the initial version of resize support was focused on being able to schedule resize-based ops within each of the pointwise and reduction schedulers, so it was designed to be flexible without too much thinking of actual implications. After all, it was supposed to be a short-term preliminary version, but we haven't spent much effort since then until recently.
In the code snippet below, where we use
expand
to change a broadcast dimension to non-broadcast, I'm seeing our scheduler forced to segment and generating two kernels.But if I switch the
expand
into apad
, we can actually handle it in a single kernel.Is this an expected behavior? I'm suspecting this is because
expand
doesn't have a dependency between the broadcast ID and the non-broadcast ID.The text was updated successfully, but these errors were encountered: