fusion should reject / error out with ambiguous in-place update semantics. #2839

jjsjann123 · 2024-08-23T18:14:21Z

Since we do not handle racing conditions. nvfuser's behavior of inplace update could be affected by segmentation result. A small example would be in below, if we force a segment with segment_set, the output from the fusion changes.

import torch
from nvfuser import FusionDefinition, DataType

def nvfuser_fusion_id1(fd : FusionDefinition) -> None :
    T0 = fd.define_tensor(shape=[-1, -1], contiguity=[True, True], dtype=DataType.Float, is_cpu=False, stride_order=[1, 0])
    T1 = fd.define_tensor(shape=[-1, -1, -1], contiguity=[True, True, True], dtype=DataType.Float, is_cpu=False, stride_order=[2, 1, 0])
    T2 = fd.ops.relu(T0)
    T3 = fd.ops.add(T0, T1)
    fd.add_output(T2, T0)
    fd.add_output(T3)

with FusionDefinition() as fd:
    nvfuser_fusion_id1(fd)

t0 = torch.randn((20,), dtype=torch.float32, device='cuda:0').as_strided((10, 2), (2, 1))
t1 = torch.randn((40,), dtype=torch.float32, device='cuda:0').as_strided((2, 10, 2), (20, 2, 1))

inputs = [t0.clone(), t1.clone()]

o = fd.execute(inputs)[0]

def nvfuser_fusion_id2(fd : FusionDefinition) -> None :
    T0 = fd.define_tensor(shape=[-1, -1], contiguity=[True, True], dtype=DataType.Float, is_cpu=False, stride_order=[1, 0])
    T1 = fd.define_tensor(shape=[-1, -1, -1], contiguity=[True, True, True], dtype=DataType.Float, is_cpu=False, stride_order=[2, 1, 0])
    T2 = fd.ops.relu(T0)
    T0_seg = fd.ops.segment_set(T0)
    T3 = fd.ops.add(T0_seg, T1)
    fd.add_output(T2, T0)
    fd.add_output(T3)

with FusionDefinition() as fd2:
    nvfuser_fusion_id2(fd2)

inputs = [t0.clone(), t1.clone()]

o_ref = fd2.execute(inputs)[0]

print(o - o_ref)

I think this means that any usage of source buffer (T0) that's not leading to the alias target buffer (T2) is ambiguous in nvfuser's definition and we should error out in these scenarios.

But this means that we are going to give off quite some false negative cases and that might be too intrusive. We can try to update this in integration fusion logic.

The text was updated successfully, but these errors were encountered:

jjsjann123 · 2024-08-23T18:15:06Z

Linking issue Lightning-AI/lightning-thunder#731 (comment)

jacobhinkle · 2024-08-23T18:45:15Z

Good point. This is kind of a non-broadcast version of #2664. The solution is to have a kind of sync analysis for aliased inputs.

jjsjann123 · 2024-08-23T19:00:25Z

Good point. This is kind of a non-broadcast version of #2664. The solution is to have a kind of sync analysis for aliased inputs.

Yeah, we touched that conversation a couple times.

In this instance, I think it's more of a question on how we want to interpret our API. I've always view nvfuser's fusion definition is building a computational graph, rather than translating a python program.
With the definition like that, the program itself is just ill-defined. Unless we ask user to explicitly build a control edge to establish the sequence of operations...

At the same time, if we are always parsing a python program, maybe the sequence is just implicit and my understanding of the protocol is wrong.

jjsjann123 · 2024-08-23T22:13:33Z

fyi @kiya00 . The check we have in Thunder is on the conservative side and I think our assumption is that the whole fusion is done within a single kernel. Likely the sanity check is not detecting this pattern.

jacobhinkle · 2024-08-28T12:29:48Z

I've always view nvfuser's fusion definition is building a computational graph, rather than translating a python program.

I agree but I thought with aliasing we were still SSA, and the guarantee was just that we would place the result back into the aliased input after computation. That way the semantics are unambiguous since we can analyze it as an SSA program, and if the user wants to specify another behavior like using the modified T0 for something, they would just use the output tensor directly (T2 in this case). That way we don't need to error when an aliased input has other uses, we just need to ensure that all loads of T0 occur before their stores i.e. before T2 is stored. This feels like something SyncMap can do, maybe with some updating (see also #2850 which is about rewriting SyncMap to use IdModel).

kiya00 · 2024-08-28T15:43:49Z

Likely the sanity check is not detecting this pattern.

Hi @jjsjann123 , is this going to be fixed in the nvfuser, or I need to update the sanity check to detect this?

jjsjann123 · 2024-08-28T19:20:37Z

I agree but I thought with aliasing we were still SSA, and the guarantee was just that we would place the result back into the aliased input after computation. That way the semantics are unambiguous since we can analyze it as an SSA program, and if the user wants to specify another behavior like using the modified T0 for something, they would just use the output tensor directly.

I agree with you and that's a reasonable protocol. Maybe we should just go with that.

Our code base as-is doesn't abide to that yet.

In the example above, if there is a user program, where input buffer T0 is being updated by with output tensor T2 aliasing T0's buffer.
later use of T0, which is supposed to be using the original value, if segmented into another kernel that executes after the inplace update, then we inevitably uses the updated buffer value and violating the protocol. So we need to do something like copying that buffer for later use...

jjsjann123 · 2024-08-28T19:23:17Z

Likely the sanity check is not detecting this pattern.

Hi @jjsjann123 , is this going to be fixed in the nvfuser, or I need to update the sanity check to detect this?

Haven't got an conclusion on that yet. I don't have any actionable item for you at the moment.

I'm hoping nvfuser would be able to pay some technical debts on in-place support via establishing a protocol and supporting more cases. We might need to update thunder's integration logic, but that's a longer story. I'll keep you updated.

jjsjann123 · 2024-08-28T19:23:51Z

cc'ing @Priya2698

jjsjann123 · 2024-11-04T17:22:42Z

closing with Priya's recent segmentation pass #2999

jjsjann123 self-assigned this Aug 23, 2024

jjsjann123 added the IOTensor label Aug 23, 2024

kevinstephano added Thunder Triage labels Oct 30, 2024

jjsjann123 closed this as completed Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fusion should reject / error out with ambiguous in-place update semantics. #2839

fusion should reject / error out with ambiguous in-place update semantics. #2839

jjsjann123 commented Aug 23, 2024

jjsjann123 commented Aug 23, 2024

jacobhinkle commented Aug 23, 2024

jjsjann123 commented Aug 23, 2024

jjsjann123 commented Aug 23, 2024

jacobhinkle commented Aug 28, 2024

kiya00 commented Aug 28, 2024

jjsjann123 commented Aug 28, 2024

jjsjann123 commented Aug 28, 2024

jjsjann123 commented Aug 28, 2024 •

edited

Loading

jjsjann123 commented Nov 4, 2024

fusion should reject / error out with ambiguous in-place update semantics. #2839

fusion should reject / error out with ambiguous in-place update semantics. #2839

Comments

jjsjann123 commented Aug 23, 2024

jjsjann123 commented Aug 23, 2024

jacobhinkle commented Aug 23, 2024

jjsjann123 commented Aug 23, 2024

jjsjann123 commented Aug 23, 2024

jacobhinkle commented Aug 28, 2024

kiya00 commented Aug 28, 2024

jjsjann123 commented Aug 28, 2024

jjsjann123 commented Aug 28, 2024

jjsjann123 commented Aug 28, 2024 • edited Loading

jjsjann123 commented Nov 4, 2024

jjsjann123 commented Aug 28, 2024 •

edited

Loading