-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stride MatmulOp
according to set allocation domain
#3447
base: main
Are you sure you want to change the base?
Conversation
b4bd2bf
to
2dee366
Compare
!test |
1 similar comment
!test |
csrc/ir/nodes.cpp
Outdated
auto strides = computeStrides(out(), matmul_sizes); | ||
matmul_out = at::as_strided(matmul_out, matmul_sizes, strides); | ||
} | ||
inferAndValidateAllocationSizesAndStrides(matmul_out, out(), ee); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about validating output allocation for all MatmulOps.
- We already validate allocation sizes/strides for each segment's inputs and outputs. Given MatmulOp currently forms its own segment, existing validation seems enough.
- If/When MatmulOp produces an internal tensor, we can't always materialize the tensor as an
at::Tensor
that matches its allocation domain. For example, the allocation domain can be a split and/or a swizzle of logical. Assuming allocation is a permutation of logical is probably OK for segment inputs/outputs, but can be too limiting for internal tensors. cc @zasdfgbnm
MatmulOp
according to set allocation domainMatmulOp
according to set allocation domain
7dfe56b
to
0d6f934
Compare
MatmulOp
according to set allocation domainMatmulOp
according to set allocation domain
91b48f5
to
deb5351
Compare
!test |
@@ -4371,7 +4372,17 @@ std::vector<PolymorphicValue> MatmulOp::evaluate( | |||
const std::vector<PolymorphicValue>& inputs) const { | |||
const auto a = inputs.at(0).as<at::Tensor>(); | |||
const auto b = inputs.at(1).as<at::Tensor>(); | |||
return {at::matmul(a, b)}; | |||
|
|||
auto matmul_out = at::matmul(a, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you give up on at::matmul_out which could save a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at::matmul_out is not used since it does not allow inputs/outputs which require gradients.
https://github.com/pytorch/pytorch/blob/1f3d8896bc9cea7f46c50ff92b69c6aa139defcb/aten/src/ATen/native/LinearAlgebra.cpp#L2018-L2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is suspicious -- We are not using ExpressionEvaluator to build a DAG for autograd, so inputs/outputs here shouldn't require grads. Where did inputs/outputs get requires_grads? Your test case didn't torch.randn(..., requires_grad=True)
obviously to start with.
cc @jjsjann123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be right, the tensor evaluations themselves may not have the requires_grad field.
I had ruled out at::matmul_out
after going through its code, and trying it independently that raised this condition, which may have been premature.
Let me try a complete example through nvfuser/thunder to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted a complete example in nvfuser and I do get an error.
I need to dig into the expression evaluator on how this flag is propagated/inferred. Expression evaluator will have this information for the fusion inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with addressing this in a separate PR. But I'd still try to understand the requires_grad bit sooner than later -- it shouldn't have been there and it blocks optimization to matmul into a pre-allocated output.
Co-authored-by: Jingyue Wu <[email protected]>
!test |
Resolves Issue #2427.
If the
MatmulOp
has a stride order set from python frontend (fd.ops.add_output/fd.ops.stride_order
), returns a copy of the output with the specified memory_layout.at::matmul_out
is not used since it does not allow inputs/outputs which require gradients.https://github.com/pytorch/pytorch/blob/1f3d8896bc9cea7f46c50ff92b69c6aa139defcb/aten/src/ATen/native/LinearAlgebra.cpp#L2018-L2025