-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Squeezed IterDomain ?S536{1} must concretize to IterType::Broadcast but found ?S536{1}. #2359
Comments
@jjsjann123 tagging myself. I think it's the reshape that's not specifying the output iterdomain properly. |
Also note to myself. how would
|
Never mind. https://github.com/Lightning-AI/lightning-thunder/blob/126940750c8e498a89376e6c787985448c79808a/thunder/executors/nvfuserex_impl.py#L299 indeed kicked in. The first dimension of T12 is 1 in the above example. |
I don't see any issue with this. The |
A smaller repro.
cc'ing @jacobhinkle looks like indeed a concretization bug. |
I'm back at keyboard. Looks like this is just a check that we should avoid at least in this case. Commenting out the call to |
Thank you! |
This test is actually tougher than the big repro on #2359. It necessitate either removing the check altogether or the path I took which is to concretize Resized to broadcast extents as constant 1 so that we can evaluate max(0, min(i0, 1)) as 1 without calling simplifyExpr. A less invasive solution would be to remove the extent check in `SqueezeOp::checkConcretization`. We could also just remove `Expr::checkConcretization` and `checkConcretizedUses`. They are only used for SqueezeOp currently and are not adding much value anyway probably.
The included test is the small one provided by @jjsjann123 in #2359 and it's actually tougher than the original repro. It necessitates either removing the check that concretized squeezed extents are constant 1 or to concretize Resized to broadcast extents as constant 1 so that we can evaluate `max(0, min(i0, 1))` as `oneVal()` without calling `simplifyExpr`. I went with removing the check, which means in this example we have broadcast dimension with a dynamic shape like `max(0, min(i0, 1))`. Since we're concretizing to Broadcast, we know that dimension is not zero; if it were then we'd concretize to `Iteration` and `SqueezeOp::checkConcretization` would fail the IterType check. Still, I don't love that the expression cannot be simplified so it appears in the kernel (`i9` and `i10`): ```c++ __global__ void nvfuser_pointwise_f0_c1_r0_g1(Tensor<float, 3, 3> T0, Tensor<float, 2, 2> T4) { nvfuser_index_t i0; i0 = ((nvfuser_index_t)threadIdx.x) + (128LL * ((nvfuser_index_t)blockIdx.x)); Tensor<float, 3, 3> s1; s1.data = T0.data; s1.logical_size = T0.logical_size; s1.alloc_stride = T0.alloc_stride; Array<nvfuser_index_t, 3, 1> a2; a2 = s1.logical_size; nvfuser_index_t i3; i3 = a2[2LL]; nvfuser_index_t i4; i4 = max(0LL, (min(4LL, i3))); nvfuser_index_t i5; i5 = min(i3, 4LL); nvfuser_index_t i6; i6 = max(0LL, i5); Array<nvfuser_index_t, 3, 1> a7; a7 = s1.logical_size; nvfuser_index_t i8; i8 = a7[0LL]; nvfuser_index_t i9; i9 = min(i8, 1LL); nvfuser_index_t i10; i10 = max(0LL, i9); Array<nvfuser_index_t, 3, 1> a11; a11 = s1.logical_size; nvfuser_index_t i12; i12 = a11[1LL]; nvfuser_index_t i13; i13 = (max(0LL, (min(2LL, i12)))) * i4; nvfuser_index_t i14; i14 = i0 % i13; nvfuser_index_t i15; i15 = min(i12, 2LL); nvfuser_index_t i16; i16 = max(0LL, i15); if ((i0 < i13)) { float T1[1LL]; T1[0LL] = 0LL; T1[0LL] = T0[((((i3 * i12) * (i0 / i13)) + (i3 * (i14 / i4))) + (i14 % i4))]; float T5[1LL]; T5[0LL] = T1[0LL]; T4[i0] = T5[0LL]; } } ``` If you look closely though, `i10` is not used so it will be DCEd anyway. Still, it might be nice to concretize broadcast extents to 1 just to clean up these expressions if they appear downstream. I tried that hastily but ran into some issues so I'll leave it for another PR. Fixes #2359
This happened when I ran the transformer block with batch_size=1. It can be reproduced by
pytest thunder/benchmarks/targets.py -k test_nanogpt_block_grad[thunder] -s
.I'm unsure whether it's a Thunder bug or nvFuser bug. I suspect define_tensor needs to say
shape=[1,...]
when the batch size is one?The text was updated successfully, but these errors were encountered: