square on complex64 has different result on xpu and cpu. #781

daisyden · 2024-08-20T06:20:58Z

🐛 Describe the bug

Seems both cpu and xpu result are questionable. the output's imag is expected to be "-1.0020e+23" but got 1.0020e+23 on cpu, while the xpu result imag is -inf.

>>> a=torch.tensor([-501.-1.0000e+20j])
>>> torch.square(a)
tensor([-inf+1.0020e+23j])
>>> b=torch.tensor([-501.-1.0000e+20j], device='xpu')
>>> torch.square(b)
tensor([-inf-infj], device='xpu:0')

"square" is implemented with std:pow, with the small case we can reproduce this issue. Let me submit a bug to compiler.

#include "sycl/CL/sycl.hpp"

void tanh_kernel(sycl::queue &q) {
  const size_t local_size = 1;
  const size_t global_size = 1;

  auto e = q.submit([&](sycl::handler &h) {
    sycl::stream output(1024, 256, h);
    //-501.-1.0000e+20j
    constexpr float re = -501.;
    constexpr float im = -1.0000e+20;
    std::complex<float> in = std::complex<float>(re, im);
    std::complex<float> result = std::pow(static_cast<std::complex<float>>(in),
                                          static_cast<std::complex<float>>(2));
    sycl::ext::oneapi::experimental::printf(
        "on host: input: imag %.16f, real %.16f -> result: imag %.16f, real "
        "%.16f\n",
        in.imag(), in.real(), result.imag(), result.real());

    h.parallel_for(sycl::nd_range<1>{global_size, local_size},
                   [=](sycl::nd_item<1> it) {
                     std::complex<float> result =
                         std::pow(static_cast<std::complex<float>>(in),
                                  static_cast<std::complex<float>>(2));
                     sycl::ext::oneapi::experimental::printf(
                         "on device: input: imag %.16f, real %.16f -> "
                         "result: imag %.16f, real %.16f\n",
                         in.imag(), in.real(), result.imag(), result.real());
                   });
  });
  e.wait();
}

int main(int argc, char *argv[]) {
  auto devs = sycl::device::get_devices(sycl::info::device_type::gpu);
  auto first = devs[1];

  sycl::device dg1(first);
  sycl::queue q(dg1, cl::sycl::property::queue::enable_profiling());

  tanh_kernel(q);

  return 0;
}

(daisy_pytorch4) gta@DUT1025PVC:~/daisyden/square$ ./a.out
on host: input: imag -100000002004087734272.0000000000000000, real -501.0000000000000000 -> result: imag 99307371180871265550336.0000000000000000, real -inf
on device: input: imag -100000002004087734272.0000000000000000, real -501.0000000000000000 -> result: imag -inf, real -inf


### Versions

latest

The text was updated successfully, but these errors were encountered:

daisyden added the ut_triage label Aug 20, 2024

chuanqi129 added this to the PT2.6 milestone Aug 27, 2024

daisyden self-assigned this Sep 12, 2024

riverliuintel modified the milestones: PT2.6, PT2.8 Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

square on complex64 has different result on xpu and cpu. #781

square on complex64 has different result on xpu and cpu. #781

daisyden commented Aug 20, 2024 •

edited

Loading

square on complex64 has different result on xpu and cpu. #781

square on complex64 has different result on xpu and cpu. #781

Comments

daisyden commented Aug 20, 2024 • edited Loading

🐛 Describe the bug

daisyden commented Aug 20, 2024 •

edited

Loading