Sharded SDPAFwdOp #2565

cowanmeg · 2024-07-10T16:20:11Z

Temporarily add support for a sharded forward scaled dot product attention.
Currently, we only support DID parallelization on the logical domain which requires us to split and parallelize an axis at the logical level see #2563. This is a hack until we support DID parallelization on the loop domain after which this PR can be reverted.

Restrictions:

q,k,v inputs are manually sharded before the SDPAFwdOp is created. We cannot rely on sharding propagation or sharding after the Fusion is created, because the dimension checks are called when the op is created.
Only the head dimension is sharded and all inputs and outputs have either a sharded head dimension or unshaded.
DID axis is the outermost axis. This is because during evaluation if we see 5 dimensions, it is assumed the first is the DID axis and is appropriately squeezed from the inputs and unsqueezed onto the outputs.

csrc/ir/nodes.cpp

Priya2698 · 2024-07-10T18:20:06Z

csrc/ir/nodes.cpp

+  // Add back the device dim axis for outputs with a head dimension.
+  if (handle_device_dim) {
+    output = output.unsqueeze(0);
+    log_sumexp = log_sumexp.unsqueeze(0);


Why do we need the DID axis in log_sumexp?

Good question, we only need it for the backwards, so we don't need to track the DID axis. I'll remove it!

wujingyue

Nice! I'd wait for Priya's approval.

csrc/root_domain_map.cpp

wujingyue · 2024-07-10T22:53:44Z

csrc/root_domain_map.cpp

      }
    }
+    // Map D from any input (query/key/value) to output, logsumexp only.


Still needed given https://github.com/NVIDIA/Fuser/pull/2565/files#r1672758030?

cowanmeg · 2024-07-11T16:20:03Z

!build

cowanmeg · 2024-07-12T16:54:14Z

!build

Priya2698

LGTM.

cowanmeg · 2024-07-12T23:25:05Z

Note: failing tests are unrelated and caused by 'nlohmann/json.hpp' file not found

Temporarily add support for a sharded forward scaled dot product attention. Currently, we only support DID parallelization on the logical domain which requires us to split and parallelize an axis at the logical level see #2563. This is a hack until we support DID parallelization on the loop domain after which this PR can be reverted. Restrictions: 1. q,k,v inputs are manually sharded _before_ the SDPAFwdOp is created. We cannot rely on sharding propagation or sharding after the Fusion is created, because the dimension checks are called when the op is created. 2. Only the head dimension is sharded and all inputs and outputs have either a sharded head dimension or unshaded. 3. DID axis is the outermost axis. This is because during evaluation if we see 5 dimensions, it is assumed the first is the DID axis and is appropriately squeezed from the inputs and unsqueezed onto the outputs.

Adds temporary support for sharded backwards scaled dot product attention. Until #2563 is completed. Similar to #2565 Similar restrictions: 1. All necessary sharded inputs are manually sharded before the SDPABwdOp is created. We cannot rely on sharding propagation or sharding after the Fusion is created, because the dimension checks are called when the op is created. 2. Only the head dimension is sharded and all inputs and outputs have either a sharded head dimension or unshaded. 3. DID axis is the outermost axis. This is because during evaluation if we see 5 dimensions, it is assumed the first is the DID axis and is appropriately squeezed from the inputs and unsqueezed onto the outputs.

cowanmeg added 3 commits July 10, 2024 07:17

progress

e0f2a8a

working

f73de5e

fix comment

a4453bf

cowanmeg requested review from Priya2698 and wujingyue July 10, 2024 16:20

Priya2698 reviewed Jul 10, 2024

View reviewed changes

csrc/ir/nodes.cpp Outdated Show resolved Hide resolved

Priya2698 reviewed Jul 10, 2024

View reviewed changes

wujingyue approved these changes Jul 10, 2024

View reviewed changes

feedback

e60acca

cowanmeg requested a review from Priya2698 July 11, 2024 16:19

fix padding bug with device dim

4d454cd

Priya2698 approved these changes Jul 12, 2024

View reviewed changes

cowanmeg merged commit 114e21d into NVIDIA:main Jul 12, 2024
17 of 20 checks passed

cowanmeg mentioned this pull request Aug 21, 2024

Sharded SDPA backwards support #2826

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharded SDPAFwdOp #2565

Sharded SDPAFwdOp #2565

cowanmeg commented Jul 10, 2024

Priya2698 Jul 10, 2024

cowanmeg Jul 10, 2024

wujingyue left a comment

wujingyue Jul 10, 2024

cowanmeg commented Jul 11, 2024

cowanmeg commented Jul 12, 2024

Priya2698 left a comment

cowanmeg commented Jul 12, 2024

Sharded SDPAFwdOp #2565

Sharded SDPAFwdOp #2565

Conversation

cowanmeg commented Jul 10, 2024

Priya2698 Jul 10, 2024

Choose a reason for hiding this comment

cowanmeg Jul 10, 2024

Choose a reason for hiding this comment

wujingyue left a comment

Choose a reason for hiding this comment

wujingyue Jul 10, 2024

Choose a reason for hiding this comment

cowanmeg commented Jul 11, 2024

cowanmeg commented Jul 12, 2024

Priya2698 left a comment

Choose a reason for hiding this comment

cowanmeg commented Jul 12, 2024