You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@wujingyue - I added the MLP test with aten matmul. Note, that the tolerance is bumped up to a bit to pass validation.
Validation error in output 0 (linear1) on line 583 in file /tests/cpp/test_multidevice_matmul.cpp.
Detected abs error of: 0.122498
absolute tolerance was set to 0.005
and relative tolerance set to 5e-05
Validation error in output 2 (linear2) on line 583 in file tests/cpp/test_multidevice_matmul.cpp.
Detected abs error of: 4.08847
absolute tolerance was set to 2
and relative tolerance set to 0.02
To reproduce the error, check out wjy/error (see 7eb2f43 for the change) and run _bn && mpirun -np 2 bin/test_multidevice --gtest_filter=DistributedMatmulTest.MLP_Layer*.
You'll see use_aten_matmul==true leads to the following error, and use_aten_matmul==false passes within 5e-3.
Validation error in output 0 on line 583 in file /opt/pytorch/nvfuser/tests/cpp/test_multidevice_matmul.cpp.
Detected abs error of: 0.122498
absolute tolerance was set to 0.005
and relative tolerance set to 5e-05
Note Detected abs error of: 0.122498 is not the max absolute error. The max is at least 4. This motivates a side feature request to print out the max absolute error instead of the first (?) one being detected.
Validation error in output 0 (linear1) on line 583 in file /tests/cpp/test_multidevice_matmul.cpp.
Detected abs error of: 0.122498
absolute tolerance was set to 0.005
and relative tolerance set to 5e-05
Validation error in output 2 (linear2) on line 583 in file tests/cpp/test_multidevice_matmul.cpp.
Detected abs error of: 4.08847
absolute tolerance was set to 2
and relative tolerance set to 0.02
Originally posted by @cowanmeg in #2360 (comment)
The text was updated successfully, but these errors were encountered: