-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove truediv operation #2837
Remove truediv operation #2837
Conversation
What exactly is the semantics of our |
Nvfuser's NVFUSER_DEFINE_BINARY_FLOAT_OP(truediv, Div)
BinaryOpType::op_type, v1, v2, TypePromotion::float_op_config); Are you proposing to just remove |
I'm not sure what I'm proposing. You're right that in the kernel our I guess for ultimate clarity we should have all these separate nodes/ops: truediv, truncdiv, floordiv, and ceildiv and nothing just called |
I believe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for translating CPP Fusion to python FusionDefinition
What's the usage here? Can you add an example to show that?
I'm also not sure if we actually do the right behavior here. i.e. we promote Tensors to floating point before div, but do we do the same for scalars? I share the concern @jacobhinkle has in his comment. But I don't think we have to resolve everything in this PR. Hence the ask of an example of what the problem we want to patch.
BTW, our python binding is binding __truediv__
to div
. Might want to change that as well, since I think that's also relevant here?
The division handling in nvfuser is a concerning question. We should open and track that with another issue. Does that sound good to you @jacobhinkle ?
csrc/type.cpp
Outdated
@@ -606,6 +608,7 @@ static const char* binary_op_type_inline_op2string(BinaryOpType t) { | |||
case BinaryOpType::Add: | |||
return "+"; | |||
case BinaryOpType::Div: | |||
case BinaryOpType::Truediv: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but /
isn't truediv though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is getting scarier here. A binary op with BinaryOpType::Truediv doesn't do truediv?!
💯 |
It is this series of PRs. |
@jjsjann123 Why not get rid of |
Do we need to support |
Looks like thunder prim is only has c++ style |
51e9833
to
3d618b4
Compare
@jjsjann123 I changed this PR to remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see why you want to have a BinaryOpType::Truediv
. in #2841, this is just so we have a clean 1-to-1 translation for the truediv part?
I'm not sure how big of a deal it is to have that 1-to-1 translation. We don't have to keep a truediv
binary op, having it implicitly handled by floating point promotion followed by a c++ style div should be good enough.
not having truediv is somewhat an inconvenience IMHO. thunder doesn't have to use it, but we still have our python API where I can define fusion definition manually. So I tend to think maybe we want to keep it, but I don't have a strong opinion there neither.
csrc/ops/arith.h
Outdated
NVF_API Val* truediv(Val* v1, Val* v2); | ||
NVF_API TensorView* truediv(TensorView* v1, Val* v2); | ||
NVF_API TensorView* truediv(Val* v1, TensorView* v2); | ||
NVF_API TensorView* truediv(TensorView* v1, TensorView* v2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive question, we don't think we need/should remove the c++ truediv API. Since these does float promotion, so it is a truediv in nature? Even though it's not a BinaryOpType::TrueDiv.
csrc/type.cpp
Outdated
@@ -606,6 +608,7 @@ static const char* binary_op_type_inline_op2string(BinaryOpType t) { | |||
case BinaryOpType::Add: | |||
return "+"; | |||
case BinaryOpType::Div: | |||
case BinaryOpType::Truediv: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is getting scarier here. A binary op with BinaryOpType::Truediv doesn't do truediv?!
@@ -1443,7 +1443,6 @@ void initNvFuserPythonBindings(PyObject* module) { | |||
NVFUSER_PYTHON_BINDING_BINARY_OP("add", add) | |||
NVFUSER_PYTHON_BINDING_BINARY_OP("atan2", atan2) | |||
NVFUSER_PYTHON_BINDING_BINARY_OP("div", div) | |||
NVFUSER_PYTHON_BINDING_BINARY_OP("truediv", truediv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, there's also an entry for __truediv__
that is translated to div
. We have a comment saying that our div is true div, which I think is a lie.
import torch
from nvfuser import FusionDefinition, DataType
def nvfuser_fusion_id1(fd : FusionDefinition) -> None :
T0 = fd.define_tensor(shape=[-1, -1], contiguity=[True, True], dtype=DataType.Int32, is_cpu=False, stride_order=[1, 0])
T1 = fd.define_tensor(shape=[-1, -1, -1], contiguity=[True, True, True], dtype=DataType.Int32, is_cpu=False, stride_order=[2, 1, 0])
T2 = T0 / T1
#T2 = fd.ops.truediv(T0, T1)
fd.add_output(T2)
with FusionDefinition() as fd:
nvfuser_fusion_id1(fd)
t0 = torch.ones((20,), dtype=torch.int32, device='cuda:0').as_strided((10, 2), (2, 1))
t1 = torch.ones((40,), dtype=torch.int32, device='cuda:0').as_strided((2, 10, 2), (20, 2, 1))
t1 += 1
inputs = [t0.clone(), t1.clone()]
o = fd.execute(inputs)[0]
print(o)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this isn't necessarily caused by your PR. I'm glad you are exposing these issues. ❤️
Two things that I do have a stronger opinion on:
- In this PR, the BinaryOpType::Truediv isn't doing true div, so that doesn't feel right.
- Not in your code, but our python binding has
1555 // In PyTorch, __div__ (//) and __truediv__ (/) are different.
1556 // When applied to integer-dtype arguments, they do as expected, returning
1557 // integer and float outputs, respectively. When applied to two floating-type
1558 // arguments, they return the floor of division for // and plain division for
1559 // /. When applied to mixed types, the types are promoted, so the
1560 // floating-point behavior is returned.
1561 // Our div operator matches the __truediv__ behavior, so we do not implement
1562 // __div__.
1563 NVFUSER_PYTHON_BINDING_BINARY_OP_SPECIAL("__truediv__", "div", div)
This is just totally wrong and we should fix it.
c1aca34
to
119c323
Compare
This PR updates the comments for `__truediv__` operator defined in python bindings. The current comment does not reflect what the code actually does. Reference: #2837 (review)
This PR updates the comments for `__truediv__` operator defined in python bindings. The current comment does not reflect what the code actually does. Reference: #2837 (review)
This PR removes the
truediv
operation, which mapped totorch.true_divide
andtorch.div(a, b, rounding_mode='none')
.Why?
truediv
fromdiv
in the Fusion IR.prim.div
maps to thediv
operator. The other types ofdiv
are implement through this prim.