Stride `MatmulOp` according to set allocation domain #3447

Priya2698 · 2024-11-19T12:35:19Z

Resolves Issue #2427.
If the MatmulOp has a stride order set from python frontend (fd.ops.add_output/fd.ops.stride_order), returns a copy of the output with the specified memory_layout.

at::matmul_out is not used since it does not allow inputs/outputs which require gradients.
https://github.com/pytorch/pytorch/blob/1f3d8896bc9cea7f46c50ff92b69c6aa139defcb/aten/src/ATen/native/LinearAlgebra.cpp#L2018-L2025

Priya2698 · 2024-11-28T16:51:21Z

!test

Priya2698 · 2024-11-29T14:10:03Z

!test

csrc/ir/nodes.cpp

wujingyue · 2024-12-02T17:32:18Z

csrc/ir/nodes.cpp

+    auto strides = computeStrides(out(), matmul_sizes);
+    matmul_out = at::as_strided(matmul_out, matmul_sizes, strides);
+  }
+  inferAndValidateAllocationSizesAndStrides(matmul_out, out(), ee);


I'm not sure about validating output allocation for all MatmulOps.

We already validate allocation sizes/strides for each segment's inputs and outputs. Given MatmulOp currently forms its own segment, existing validation seems enough.

If/When MatmulOp produces an internal tensor, we can't always materialize the tensor as an at::Tensor that matches its allocation domain. For example, the allocation domain can be a split and/or a swizzle of logical. Assuming allocation is a permutation of logical is probably OK for segment inputs/outputs, but can be too limiting for internal tensors. cc @zasdfgbnm

csrc/ir/utils.h

tests/python/test_matmul.py

csrc/ir/nodes.cpp

Priya2698 · 2024-12-11T18:53:17Z

!test

wujingyue · 2024-12-11T22:55:06Z

csrc/ir/nodes.cpp

@@ -4371,7 +4372,17 @@ std::vector<PolymorphicValue> MatmulOp::evaluate(
    const std::vector<PolymorphicValue>& inputs) const {
  const auto a = inputs.at(0).as<at::Tensor>();
  const auto b = inputs.at(1).as<at::Tensor>();
-  return {at::matmul(a, b)};
+
+  auto matmul_out = at::matmul(a, b);


Did you give up on at::matmul_out which could save a copy?

at::matmul_out is not used since it does not allow inputs/outputs which require gradients.
https://github.com/pytorch/pytorch/blob/1f3d8896bc9cea7f46c50ff92b69c6aa139defcb/aten/src/ATen/native/LinearAlgebra.cpp#L2018-L2025

This is suspicious -- We are not using ExpressionEvaluator to build a DAG for autograd, so inputs/outputs here shouldn't require grads. Where did inputs/outputs get requires_grads? Your test case didn't torch.randn(..., requires_grad=True) obviously to start with.

cc @jjsjann123

You might be right, the tensor evaluations themselves may not have the requires_grad field.
I had ruled out at::matmul_out after going through its code, and trying it independently that raised this condition, which may have been premature.
Let me try a complete example through nvfuser/thunder to verify.

I attempted a complete example in nvfuser and I do get an error.
I need to dig into the expression evaluator on how this flag is propagated/inferred. Expression evaluator will have this information for the fusion inputs.

I'm fine with addressing this in a separate PR. But I'd still try to understand the requires_grad bit sooner than later -- it shouldn't have been there and it blocks optimization to matmul into a pre-allocated output.

csrc/tensor_metadata.h

tests/python/test_matmul.py

Co-authored-by: Jingyue Wu <[email protected]>

Priya2698 · 2024-12-14T00:05:20Z

!test

Priya2698 force-pushed the pm/matmul_stride_order branch 3 times, most recently from b4bd2bf to 2dee366 Compare November 28, 2024 16:27

Priya2698 marked this pull request as ready for review December 2, 2024 17:06

Priya2698 requested review from jjsjann123 and wujingyue and removed request for jjsjann123 December 2, 2024 17:06

wujingyue reviewed Dec 2, 2024

View reviewed changes

jjsjann123 requested changes Dec 2, 2024

View reviewed changes

csrc/ir/utils.h Outdated Show resolved Hide resolved

tests/python/test_matmul.py Outdated Show resolved Hide resolved

csrc/ir/nodes.cpp Outdated Show resolved Hide resolved

Priya2698 marked this pull request as draft December 3, 2024 09:09

Priya2698 changed the title ~~Stride MatmulOp according to set allocation domain~~ [WIP] Stride MatmulOp according to set allocation domain Dec 3, 2024

Priya2698 force-pushed the pm/matmul_stride_order branch from 7dfe56b to 0d6f934 Compare December 11, 2024 00:11

Priya2698 changed the title ~~[WIP] Stride MatmulOp according to set allocation domain~~ Stride MatmulOp according to set allocation domain Dec 11, 2024

Priya2698 added 14 commits December 11, 2024 10:46

consider reduction axis in stride->allocation

3ac2a67

fix errors

745f739

move to utility function, tests

1aadadb

move verify fn to utils

a5f9cc7

remove extraneous changes

a32e472

stride matmul out if allocation domain is specified

3d26adf

move to tensor metadata, add validation check

f1bd6bb

remove redundant rebase changes

fef008f

rename, format

2edeed9

format

bf7dba0

move validation after restriding

8aaa6ea

explicit copy

01d1741

update compute stride call

79b9f34

return early

deb5351

Priya2698 force-pushed the pm/matmul_stride_order branch from 91b48f5 to deb5351 Compare December 11, 2024 18:47

Priya2698 marked this pull request as ready for review December 11, 2024 22:29

Priya2698 requested review from jjsjann123 and wujingyue December 11, 2024 22:50

wujingyue reviewed Dec 11, 2024

View reviewed changes

Priya2698 added 4 commits December 11, 2024 16:03

modify example to be of sdpa pattern

7dc27b7

move util function to nodes.cpp, update test

82a1e1f

rm extraneous changes

e7ffd02

lintrunner

25c1a37

wujingyue approved these changes Dec 13, 2024

View reviewed changes

tests/python/test_matmul.py Outdated Show resolved Hide resolved

Update tests/python/test_matmul.py

bb986b5

Co-authored-by: Jingyue Wu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stride `MatmulOp` according to set allocation domain #3447

Stride `MatmulOp` according to set allocation domain #3447

Priya2698 commented Nov 19, 2024 •

edited

Loading

Priya2698 commented Nov 28, 2024

Priya2698 commented Nov 29, 2024

wujingyue Dec 2, 2024

Priya2698 commented Dec 11, 2024

wujingyue Dec 11, 2024

Priya2698 Dec 11, 2024

wujingyue Dec 12, 2024

Priya2698 Dec 12, 2024 •

edited

Loading

Priya2698 Dec 12, 2024

wujingyue Dec 14, 2024

Priya2698 commented Dec 14, 2024

Stride MatmulOp according to set allocation domain #3447

Are you sure you want to change the base?

Stride MatmulOp according to set allocation domain #3447

Conversation

Priya2698 commented Nov 19, 2024 • edited Loading

Priya2698 commented Nov 28, 2024

Priya2698 commented Nov 29, 2024

wujingyue Dec 2, 2024

Choose a reason for hiding this comment

Priya2698 commented Dec 11, 2024

wujingyue Dec 11, 2024

Choose a reason for hiding this comment

Priya2698 Dec 11, 2024

Choose a reason for hiding this comment

wujingyue Dec 12, 2024

Choose a reason for hiding this comment

Priya2698 Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Priya2698 Dec 12, 2024

Choose a reason for hiding this comment

wujingyue Dec 14, 2024

Choose a reason for hiding this comment

Priya2698 commented Dec 14, 2024

Stride `MatmulOp` according to set allocation domain #3447

Stride `MatmulOp` according to set allocation domain #3447

Priya2698 commented Nov 19, 2024 •

edited

Loading

Priya2698 Dec 12, 2024 •

edited

Loading