-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Test codediff in CI #3311
base: main
Are you sure you want to change the base?
Conversation
!build --diff |
--- d88dcba8
+++ 5faefc04
@@ -1,6 +1,6 @@
-__global__ void nvfuser_N(Tensor<double, 4, 4> T0, Tensor<double, 2, 2> T8, Tensor<double, 2, 2> T7) {
+__global__ void nvfuser_N(Tensor<double, 4, 4> T0, Tensor<double, 2, 2> T7, Tensor<double, 2, 2> T8) {
alignas(16) extern __shared__ char array[];
void* shared_mem = array;
NVFUSER_DEFINE_MAGIC_ZERO;
Tensor<double, 4, 4> s0;
s0.data = T0.data;
@@ -93,11 +93,11 @@
double T1[1];
T1[0]
= T9[0]
/ d28;
if (b18) {
- T8[i15]
+ T7[i15]
= T1[0];
}
double T10[1];
broadcast::blockBroadcast<false, true, false, true>(T10[0], T1[0], static_cast<double*>(shared_mem), true);
double T2[1];
@@ -125,9 +125,9 @@
double T4[1];
T4[0]
= T2[0]
* d26;
if (b18) {
- T7[i15]
+ T8[i15]
= T4[0];
}
} This doesn't look serde-related. It might be a sign of non-determinism though. I'm investigating... |
Actually it does seem that I can repro this locally on main. Using the script from #3312, I see the following: rm -rf /tmp/nvfuser_kernel_db
tools/check_determinism.sh -- pytest -vs tests/python/test_ops.py::test_correctness_var_mean_float64
# This fails with a message like
# 10845c10845
# < __global__ void nvfuser_inner_persistent_f7_c1_r0_g2(Tensor<double, 1, 1> T0, Tensor<double, 0, 0> T8, Tensor<double, 0, 0> T7) {
# ---
# > __global__ void nvfuser_inner_persistent_f7_c1_r0_g2(Tensor<double, 1, 1> T0, Tensor<double, 0, 0> T7, Tensor<double, 0, 0> T8) {
# 10897c10897
# < T8[0]
# ---
# > T7[0]
# 10923c10923
# < T7[0]
# ---
# > T8[0]
# Diff of __tmp_kernel_inner_persistent_f7_c1_r0_g2.cu from rep 1 to rep 5 (above) is non-zero
rm -rf /tmp/nvfuser_kernel_db
export DEBUG_SERDE=disable
tools/check_determinism.sh -- pytest -vs tests/python/test_ops.py::test_correctness_var_mean_float64
# This succeeds So it looks like there is non-determinism in deserialization possibly, and also we are still not properly disabling serde in the CI diff jobs. I don't understand how this can be since we have this line Fuser/tools/codediff/compare_codegen.sh Line 192 in f394b4e
cc @rdspring1 @xwang233 |
!build !test --diff-bench --diff |
!test --diff-bench --diff |
No description provided.