-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in scatter #2245
Fix bug in scatter #2245
Conversation
I find this bug really counter-intuitive! I thought it was already checked here that the buffers were contiguous... |
!build --dist |
Huh...I don't know how this didn't come up earlier... |
We have assumed that the input tensor is contiguous when lowering comms (probably should have added this contiguous() call before), but our tests have all have contiguous aten inputs, so I'm not certain where the non-contiguity got introduced... |
Code before #2172 doesn't |
This is exposed by #2168. But the root cause I believe is that insertReshardings doesn't set allocation domain properly. IIRC, you and @jjsjann123 noticed this potential problem in another PR, but we never get a chance to fix this properly. Below are fusion IR for the two stages. Although
I think our options are:
I suspect option 2 will take a while and it's a bad idea to leave CI broken, so I'll cross that out. Option 3 is safest because #2168 could triggered other failure cases that we are not aware just yet. Option 1 is suboptimal, but as long as we (👀 @cowanmeg) fix the root cause soon, we should be fine. Wdyt? |
Oops. sorry about that. 😛 @samnordmann ignore the thunder tests. there's something with transformer_engine. |
Yes, I agree option 2 is the correct fix. Plus we need to set allocation domain for DID parallelism on the leaf domain to work. Hopefully, I can find some time soon to work on this! |
Sets allocation domain of sharded tensors during the pass `propagateShardingsAndSetAllocationDomain`. The two passes are merged in attempt to reduce the number of passes over all expressions in the fusion. Allocation domain is set to the tv's leaf domain. Since presegmentation passes and scheduling occur after the sharding passes, the leaf domain is identical to the rfact domain. After DID parallelization of the leaf domain is allowed the leaf and rfactor domain will not be the same. This will avoid issues such as #2245 (comment) and allow the `AllocationDomainPass` presegmentation pass on for distributed matmul tests
Sets allocation domain of sharded tensors during the pass `propagateShardingsAndSetAllocationDomain`. The two passes are merged in attempt to reduce the number of passes over all expressions in the fusion. Allocation domain is set to the tv's leaf domain. Since presegmentation passes and scheduling occur after the sharding passes, the leaf domain is identical to the rfact domain. After DID parallelization of the leaf domain is allowed the leaf and rfactor domain will not be the same. This will avoid issues such as #2245 (comment) and allow the `AllocationDomainPass` presegmentation pass on for distributed matmul tests
Fixes a subtle bug, exposed by #2168