Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stride MatmulOp according to set allocation domain #3447

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

Priya2698
Copy link
Collaborator

@Priya2698 Priya2698 commented Nov 19, 2024

Resolves Issue #2427.
If the MatmulOp has a stride order set from python frontend (fd.ops.add_output/fd.ops.stride_order), returns a copy of the output with the specified memory_layout.

at::matmul_out is not used since it does not allow inputs/outputs which require gradients.
https://github.com/pytorch/pytorch/blob/1f3d8896bc9cea7f46c50ff92b69c6aa139defcb/aten/src/ATen/native/LinearAlgebra.cpp#L2018-L2025

@Priya2698 Priya2698 force-pushed the pm/matmul_stride_order branch 3 times, most recently from b4bd2bf to 2dee366 Compare November 28, 2024 16:27
@Priya2698
Copy link
Collaborator Author

!test

1 similar comment
@Priya2698
Copy link
Collaborator Author

!test

@Priya2698 Priya2698 marked this pull request as ready for review December 2, 2024 17:06
@Priya2698 Priya2698 requested review from jjsjann123 and wujingyue and removed request for jjsjann123 December 2, 2024 17:06
csrc/ir/nodes.cpp Outdated Show resolved Hide resolved
csrc/ir/nodes.cpp Outdated Show resolved Hide resolved
auto strides = computeStrides(out(), matmul_sizes);
matmul_out = at::as_strided(matmul_out, matmul_sizes, strides);
}
inferAndValidateAllocationSizesAndStrides(matmul_out, out(), ee);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about validating output allocation for all MatmulOps.

  1. We already validate allocation sizes/strides for each segment's inputs and outputs. Given MatmulOp currently forms its own segment, existing validation seems enough.
  2. If/When MatmulOp produces an internal tensor, we can't always materialize the tensor as an at::Tensor that matches its allocation domain. For example, the allocation domain can be a split and/or a swizzle of logical. Assuming allocation is a permutation of logical is probably OK for segment inputs/outputs, but can be too limiting for internal tensors. cc @zasdfgbnm

csrc/ir/utils.h Outdated Show resolved Hide resolved
tests/python/test_matmul.py Outdated Show resolved Hide resolved
csrc/ir/nodes.cpp Outdated Show resolved Hide resolved
@Priya2698 Priya2698 marked this pull request as draft December 3, 2024 09:09
@Priya2698 Priya2698 changed the title Stride MatmulOp according to set allocation domain [WIP] Stride MatmulOp according to set allocation domain Dec 3, 2024
@Priya2698 Priya2698 force-pushed the pm/matmul_stride_order branch from 7dfe56b to 0d6f934 Compare December 11, 2024 00:11
@Priya2698 Priya2698 changed the title [WIP] Stride MatmulOp according to set allocation domain Stride MatmulOp according to set allocation domain Dec 11, 2024
@Priya2698 Priya2698 force-pushed the pm/matmul_stride_order branch from 91b48f5 to deb5351 Compare December 11, 2024 18:47
@Priya2698
Copy link
Collaborator Author

!test

@Priya2698 Priya2698 marked this pull request as ready for review December 11, 2024 22:29
@@ -4371,7 +4372,17 @@ std::vector<PolymorphicValue> MatmulOp::evaluate(
const std::vector<PolymorphicValue>& inputs) const {
const auto a = inputs.at(0).as<at::Tensor>();
const auto b = inputs.at(1).as<at::Tensor>();
return {at::matmul(a, b)};

auto matmul_out = at::matmul(a, b);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you give up on at::matmul_out which could save a copy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at::matmul_out is not used since it does not allow inputs/outputs which require gradients.
https://github.com/pytorch/pytorch/blob/1f3d8896bc9cea7f46c50ff92b69c6aa139defcb/aten/src/ATen/native/LinearAlgebra.cpp#L2018-L2025

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suspicious -- We are not using ExpressionEvaluator to build a DAG for autograd, so inputs/outputs here shouldn't require grads. Where did inputs/outputs get requires_grads? Your test case didn't torch.randn(..., requires_grad=True) obviously to start with.

cc @jjsjann123

Copy link
Collaborator Author

@Priya2698 Priya2698 Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be right, the tensor evaluations themselves may not have the requires_grad field.
I had ruled out at::matmul_out after going through its code, and trying it independently that raised this condition, which may have been premature.
Let me try a complete example through nvfuser/thunder to verify.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted a complete example in nvfuser and I do get an error.
I need to dig into the expression evaluator on how this flag is propagated/inferred. Expression evaluator will have this information for the fusion inputs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with addressing this in a separate PR. But I'd still try to understand the requires_grad bit sooner than later -- it shouldn't have been there and it blocks optimization to matmul into a pre-allocated output.

csrc/tensor_metadata.h Outdated Show resolved Hide resolved
tests/python/test_matmul.py Outdated Show resolved Hide resolved
tests/python/test_matmul.py Outdated Show resolved Hide resolved
tests/python/test_matmul.py Outdated Show resolved Hide resolved
@Priya2698
Copy link
Collaborator Author

!test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants