-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IdModel buildAllGraphs() fails in presegment pass #2253
Comments
Can you please run the fusion without the preseg pass but |
It's indeed failing. Running on top of #2252 (where preseg is no longer failing). You can see the added assert during GpuLower.
|
Thanks for checking! |
AllocationOrderInference only uses EXACT graph and we do not have to build everything. This is to temporarily patch CI as we work on IdModel bug (tracked in #2253)
@jjsjann123, I'm just curious if the fusion in the issue description was generated from running Thunder's microbenchmark
|
Oh no.... you shouldn't be running into this issue. (i.e. the issue still stands but it shouldn't pop up in codegen any more after #2252 ). But it's possible that even building EXACT graph triggers this issue. Can you give me a repro command for running that benchmark? |
I also see this problem running regular e2e benchmark now (current commit 8baa550):
|
The pattern from the benchmark looks similar to what we have in this repro. I think it's an accidental change here: #2298 (comment) |
@jjsjann123 The Python repro doesn't seem to fail with TOT.
Hopefully it was fixed during the last several months, but could you also check the repro? |
This would still fail. It passed since we have PR #2252 that stops building loop graph. If you revert that change with
Then you'll hit another assert vvv
|
Thanks. Confirmed it's still an issue. BTW, it seems there's a resize to the same size. In the Exact graph, there are three groups of IDs, each of which has an expr to itself with a resize op. It seems that corresponds to resizing the input domain of size 32 to 32. We could have a preseg optimization to make it a no-op, but I wonder if that's something common enough to add another preseg pass. |
Thanks for pointing it out.
I don't remember where the issue was originated (is it coming from thunder?!), but the https://github.com/Lightning-AI/lightning-thunder/blob/0073731b8a7290fcf86fcd992c5763b353ef5d65/thunder/clang/__init__.py#L636-L642 So yes this is pretty common today. But I'm not totally sure how it's going to look like when we translate it to nvfuser (with dynamic shapes), i.e. whether nvfuser would see that these resize is a new input and we'll realize that it's the same shape after concretization, or the new shape will be extracted from a TensorView within the graph. In the latter case, I don't think we'll see the new iterdomain as a resize. |
I haven't closely looked at what operations are actually there, but from the Exact graph, there must be something like:
The Exact graph discovers |
Is this still an issue? |
Looks like the issue was magically fixed. At least the script runs without any issue now. |
Not sure if this is a real issue or just a mis-use of IdModel.
Here's the repro script vvv (likely it'll be rendered as obsolete after #2252)
Basically passing the fusion from this program below to
IdModel
construct triggers an assert duringbuildLoopGraph()
. I can help getting a cpp test if this turns out to be a real issue. cc'ing @naoyamBacktrace
The text was updated successfully, but these errors were encountered: