Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

square on complex64 has different result on xpu and cpu. #781

Open
daisyden opened this issue Aug 20, 2024 · 0 comments
Open

square on complex64 has different result on xpu and cpu. #781

daisyden opened this issue Aug 20, 2024 · 0 comments
Assignees
Milestone

Comments

@daisyden
Copy link
Contributor

daisyden commented Aug 20, 2024

🐛 Describe the bug

Seems both cpu and xpu result are questionable. the output's imag is expected to be "-1.0020e+23" but got 1.0020e+23 on cpu, while the xpu result imag is -inf.

>>> a=torch.tensor([-501.-1.0000e+20j])
>>> torch.square(a)
tensor([-inf+1.0020e+23j])
>>> b=torch.tensor([-501.-1.0000e+20j], device='xpu')
>>> torch.square(b)
tensor([-inf-infj], device='xpu:0')

"square" is implemented with std:pow, with the small case we can reproduce this issue. Let me submit a bug to compiler.

#include "sycl/CL/sycl.hpp"

void tanh_kernel(sycl::queue &q) {
  const size_t local_size = 1;
  const size_t global_size = 1;

  auto e = q.submit([&](sycl::handler &h) {
    sycl::stream output(1024, 256, h);
    //-501.-1.0000e+20j
    constexpr float re = -501.;
    constexpr float im = -1.0000e+20;
    std::complex<float> in = std::complex<float>(re, im);
    std::complex<float> result = std::pow(static_cast<std::complex<float>>(in),
                                          static_cast<std::complex<float>>(2));
    sycl::ext::oneapi::experimental::printf(
        "on host: input: imag %.16f, real %.16f -> result: imag %.16f, real "
        "%.16f\n",
        in.imag(), in.real(), result.imag(), result.real());

    h.parallel_for(sycl::nd_range<1>{global_size, local_size},
                   [=](sycl::nd_item<1> it) {
                     std::complex<float> result =
                         std::pow(static_cast<std::complex<float>>(in),
                                  static_cast<std::complex<float>>(2));
                     sycl::ext::oneapi::experimental::printf(
                         "on device: input: imag %.16f, real %.16f -> "
                         "result: imag %.16f, real %.16f\n",
                         in.imag(), in.real(), result.imag(), result.real());
                   });
  });
  e.wait();
}

int main(int argc, char *argv[]) {
  auto devs = sycl::device::get_devices(sycl::info::device_type::gpu);
  auto first = devs[1];

  sycl::device dg1(first);
  sycl::queue q(dg1, cl::sycl::property::queue::enable_profiling());

  tanh_kernel(q);

  return 0;
}


(daisy_pytorch4) gta@DUT1025PVC:~/daisyden/square$ ./a.out
on host: input: imag -100000002004087734272.0000000000000000, real -501.0000000000000000 -> result: imag 99307371180871265550336.0000000000000000, real -inf
on device: input: imag -100000002004087734272.0000000000000000, real -501.0000000000000000 -> result: imag -inf, real -inf


### Versions

latest
@chuanqi129 chuanqi129 added this to the PT2.6 milestone Aug 27, 2024
@daisyden daisyden self-assigned this Sep 12, 2024
@riverliuintel riverliuintel modified the milestones: PT2.6, PT2.8 Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants