Lowering vectorized pad #3261

jjsjann123 · 2024-10-23T18:42:46Z

Added support for lowering TernaryOp:where with vectorization factor.

i.e.

predicate
  ? loadGlobalToLocal<...>(&dst[0], &src[i_src])
  : dst.set(0.0f)

Currently this can only be done via manual scheduling. The follow up PR on vectorization analysis will make this automatically applied in PR #3321

This reverts commit d0addc4.

…ann123/resize_vec

jjsjann123 · 2024-11-04T16:46:26Z

csrc/codegen.cpp

@@ -402,6 +402,52 @@ class CudaKernelGenerator : private kir::ConstIrVisitor {
    }
  }

+  void generateVectorizedLdSt(Val* in, Val* out, CacheOp cache_op) {


mechanical change, this is lifted from void handle(const LoadStoreOp* ldst) final, so it can be shared with TernaryOp handling.

jjsjann123 · 2024-11-04T16:50:55Z

!test

jjsjann123 · 2024-11-04T18:02:10Z

!test

zasdfgbnm · 2024-11-04T19:01:44Z

tests/cpp/test_resize.cpp

@@ -4041,4 +4041,37 @@ TEST_F(ResizeTest, SliceSliceConcatConcat) {
  NVF_CHECK(ref.equal(cg_outputs[0]));
 }

+// manual scheduling that should have vectorized load on padded inputs.
+TEST_F(ResizeTest, VectorizePadLowering) {


Should we have a test for vectorizing where without using pad?

good call. almost forgot that we have where directly 🤕

csrc/codegen.cpp

naoyam · 2024-11-04T22:44:30Z

csrc/codegen.cpp

@@ -1001,6 +1051,50 @@ class CudaKernelGenerator : private kir::ConstIrVisitor {
  }

  void handle(const TernaryOp* top) final {
+    // Get vectorization information


Can you add some comments about the expectation? IIUC, only in2 is allowed to be vectorized, but technically speaking, it should be possible to have vectorized loads in both in2 and in3, right? Not sure if it's worthwhile to allow that as well, although the required change seems minimal.

Yes we can have in2 / in3 as TensorViews, I'm trying to add that since @zasdfgbnm mentioned about having a where test.

csrc/codegen.cpp

jjsjann123 · 2024-11-05T00:39:41Z

!test

naoyam

LGTM

Adding **conditional** support of reszie in vectorization analysis. This PR allows vectorized load on `PadOp` directly without using cache load. This PR improves performance of generated kernel. What's in this PR: 1. Add propagation rule for resize in vectorization analysis. The propagation rule works as: i. For supported resize: a). project the resize op to the frontier and clear `(frontier.begin(), resize_position)`; b). add projected extent of the new resize op as `gcd(id_from, resize_op->leftExpand(), resize_op->rightExpand)` ii. For unsupported resize: clear `[frontier.begin(), resize_position]`; no behavior change. 2. updating TensorView::cacheAfter to opt-in a set of uses to cache while leaving other uses unchanged. Necessary for cases where inputs are used by PadOp as well as other operation that relies on cached load for vectorization. Follow up to #3261. Work for supporting rope performance. [design doc](https://docs.google.com/document/d/1tafRMNIXMmHlIGAiNlaPkYp6mZAzJ2Rh_NtARHbmYNA/edit?disco=AAABYEnV_ZY): --------- Co-authored-by: Naoya Maruyama <[email protected]>

jjsjann123 and others added 30 commits September 2, 2024 04:27

relaxing check

8f9708f

allow cache on inputs for pad

54826aa

Merge remote-tracking branch 'origin/main' into jjsjann123/resize_vec

e54938c

cpp example

2bc3c7a

Merge branch 'jjsjann123/pad_vec' into jjsjann123/resize_vec

d04e8c3

reverting earlier changes

d0addc4

Revert "reverting earlier changes"

490fdbe

This reverts commit d0addc4.

cherry-pick my revert

51c3022

Merge remote-tracking branch 'origin/main' into jjsjann123/resize_vec

1158ef0

debug print

fdc6a9a

Merge remote-tracking branch 'origin/main' into jjsjann123/resize_vec

9a6c03a

removing comments

a9d16ce

removing assert

3401119

Merge remote-tracking branch 'origin/main' into jjsjann123/resize_vec

5d05284

patching test

b6587ee

Merge remote-tracking branch 'origin/main' into jjsjann123/resize_vec

28decac

Merge remote-tracking branch 'origin/main' into HEAD

3e53feb

fixing test

ad61ecb

fixing

a8edc56

fixing test

9cdeb64

does this work to replace Ternary(where) with IfThenElse

09a2aee

fixing build

895d0bf

removing print

7a15e22

restore lower to ternary:where; restore vectorization on tests

a6e8fb1

testing water

fe0f263

fixing syntax

baa7b09

now it's functional

ca5ced1

better formatting on printed code

e0492d3

adding a tab

b528429

supporting local memory

a23e010

jjsjann123 added 3 commits November 4, 2024 08:41

polish PR for review

a67fb57

Merge remote-tracking branch 'origin/jjsjann123/resize_vec' into jjsj…

11cd4d1

…ann123/resize_vec

Merge remote-tracking branch 'origin/main' into jjsjann123/resize_vec

65aa77d

jjsjann123 commented Nov 4, 2024

View reviewed changes

missed one arg

1f75d7a

jjsjann123 marked this pull request as ready for review November 4, 2024 16:50

jjsjann123 requested review from naoyam, zasdfgbnm and jacobhinkle November 4, 2024 16:51

oops, fixing the generated code

4c92371

zasdfgbnm reviewed Nov 4, 2024

View reviewed changes

naoyam reviewed Nov 4, 2024

View reviewed changes

csrc/codegen.cpp Outdated Show resolved Hide resolved

naoyam reviewed Nov 4, 2024

View reviewed changes

csrc/codegen.cpp Outdated Show resolved Hide resolved

jjsjann123 added 5 commits November 4, 2024 15:41

review comments

3ec2a6b

fixing code

d2864ab

I think this is fixed now

1b4f2c1

adding comments per review request

0e4e61f

another comment

4d4f747

jjsjann123 requested review from naoyam and zasdfgbnm November 5, 2024 00:39

naoyam approved these changes Nov 5, 2024

View reviewed changes

jjsjann123 merged commit 62bd3b5 into main Nov 5, 2024
47 checks passed

jjsjann123 deleted the jjsjann123/resize_vec branch November 5, 2024 16:51

jjsjann123 mentioned this pull request Nov 6, 2024

Adding resize(PadOp) vectorization analysis #3321

Merged

jjsjann123 restored the jjsjann123/resize_vec branch November 12, 2024 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lowering vectorized pad #3261

Lowering vectorized pad #3261

jjsjann123 commented Oct 23, 2024 •

edited

Loading

jjsjann123 Nov 4, 2024

jjsjann123 commented Nov 4, 2024

jjsjann123 commented Nov 4, 2024

zasdfgbnm Nov 4, 2024

jjsjann123 Nov 4, 2024

naoyam Nov 4, 2024

jjsjann123 Nov 4, 2024

jjsjann123 commented Nov 5, 2024

naoyam left a comment

Lowering vectorized pad #3261

Lowering vectorized pad #3261

Conversation

jjsjann123 commented Oct 23, 2024 • edited Loading

jjsjann123 Nov 4, 2024

Choose a reason for hiding this comment

jjsjann123 commented Nov 4, 2024

jjsjann123 commented Nov 4, 2024

zasdfgbnm Nov 4, 2024

Choose a reason for hiding this comment

jjsjann123 Nov 4, 2024

Choose a reason for hiding this comment

naoyam Nov 4, 2024

Choose a reason for hiding this comment

jjsjann123 Nov 4, 2024

Choose a reason for hiding this comment

jjsjann123 commented Nov 5, 2024

naoyam left a comment

Choose a reason for hiding this comment

jjsjann123 commented Oct 23, 2024 •

edited

Loading