Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harden assertBuffersHaveSameSize to check shapes. #3531
Harden assertBuffersHaveSameSize to check shapes. #3531
Changes from all commits
33bf6ba
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StmtSort and other stuff in iter_visiter.h assume the SSA property of
Fusion
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean TVs in a kir::Kernel (also non-SSA) get wrong
Val::uses()
, which should be avoided using?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be actually used, but non-SSA definitions in the Kernel IR are pretty limited so far, so we may not encounter any problems. But in general, it isn't a well ironed out use scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I'll have to revisit how we evaluate for-loops. One potential approach is to only invalidate loop-index-dependent scalars and let TensorView ops in the loop body run unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of
tva_j_unsqueezed
triggered a weird problem that @samnordmann is probably aware of. I added more explanation and wonder what @naoyam think about this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make TensorView live even if no output depends on it for HostIR. Not sure if that would solve the issue, though, as I'm still not entirely clear what the issue is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aware we artificially need to add the matmul's output as a fusion output to fix the data dependency, that is why
tvc_j
was added in the first place. However, I was not aware of the other bug you're mentioning -- that we only traverse the first registered producing Expr.Would the program break if you only let
hic->addOutput(tvc_j);
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes as I commented at https://github.com/NVIDIA/Fuser/pull/3531/files#diff-30df6421558f87ef0024b01f11752c35d3d68b80a9e6e0ec0fd49de535acb91aR917
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but I am not sure to fully understand the reason why it breaks. Even if the visitor only traverses through the first definition, i.e., the
SelectOp
, thentvc_j
should still be invalidated because theSelectOp
consumes the indexj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tvcj will be invalidated but tvaj unsqueeze won't be. As a result it holds always hold the first iteration value