Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question about implementing SVE(and possibly NEON?) intrinsics #1225

Open
MeteorEmber opened this issue Sep 23, 2024 · 1 comment
Open

Comments

@MeteorEmber
Copy link

Hi, I am currently implementing some of the SVE intrinsics in SIMDe (primarily RISCV and emulated version).
However, there are some problems regarding integer divided by zero and floating point NaN/INF division and comparison.

On AArch64, interger divided by zero simply resulting in zero without invoking any hardware exception, which is different than RISCV and X86.

Integer divided by zero resulting in an all-bits-set value on RISCV, and on X86, SIGFPE raised and crash the program.
So my naive implementation would be something like this, the idea is to check if the divsor is zero to decide we should do the division or not.

See the following example:

SIMDE_FUNCTION_ATTRIBUTES
simde_svint32_t
simde_svdiv_s32_x(simde_svbool_t pg, simde_svint32_t op1, simde_svint32_t op2) {
  #if defined(SIMDE_ARM_SVE_NATIVE)
    return svdiv_s32_x(pg, op1, op2);
  #else
    simde_svint32_t r;
    HEDLEY_STATIC_CAST(void, pg);

    #if defined(SIMDE_RISCV_V_NATIVE) && defined(SIMDE_RISCV_V_VLA)
      uint32_t    vl = simde_svcntw();
      vbool32_t mask = b8_to_b32(simde_sve_to_rvv_mask_32(pg));
      r = __riscv_vdiv_vv_i32m1_m(mask, op1, op2, vl);
      r = __riscv_vmerge_vxm_i32m1(r, 0, __riscv_vmseq_vx_i32m1_b32_m(mask, op2, 0, vl), vl);
    #else
      SIMDE_VECTORIZE
      for (int i = 0 ; i < HEDLEY_STATIC_CAST(int, sizeof(r.values) / sizeof(r.values[0])) ; i++) {
        r.values[i] = (op2.values[i] == INT32_C(0)) ? INT32_C(0) : op1.values[i] / op2.values[i];
      }
    #endif

    return r;
  #endif
}

There are similar problems in floating point comparision (max, min) related functions when NaN is involved.

My qeustion is, do we need to consider such situations when implement SIMDe? Or we just leave it to simpliest implementation without checking the divisor? Because in current NEON implementation in SIMDe, I didn't see anything try to address such problems. Thanks for any advise in advance.

@nemequ
Copy link
Member

nemequ commented Sep 26, 2024

Great question!

SIMDe has a series of macros in simde-common.h for dealing with problems like these. Unfortunately, it means some ifdefs and alternate implementations, which can definitely be annoying to implement :(

For the floating-point min/max operations, the right one would be SIMDE_FAST_NANS (which will be defined by default if the -ffinite-math-only flag is passed).

The division by zero thing is more interesting. This can be very tricky with floating point operations; it's actually quite a bit worse than just different behavior on different platforms; on some platforms you can use CPU flags to control whether or not division by zero generates an exception or not. Some platforms share a register for FP env flags between SIMD and non-SIMD code, some don't.

Luckily the issue here is integer division by zero. In C integer division by zero is undefined behavior. I'd be interested to know how you checked what x86 did; the only integer division functions (_mm_div_epi32, etc.) are in SVML and don't actually correspond to individual instructions in hardware, but if you just did a / on a couple of vectors (or scalars) then the operation would be undefined.

The question here is whether division by 0 is defined in the RISC-V ISA, or if it's undefined and the implementation you're seeing happens to yield ~0. I'm honestly not sure where exactly the formal specification for the instructions are, but https://github.com/riscv-software-src/riscv-isa-sim/blob/master/riscv/insns/vdiv_vv.h seems to point it being defined (which, honestly, is what I would expect).

Given that, I think the right solution here would probably be to add a SIMDE_FAST_IDIV0 macro (or something similar). If that is not defined do the check, if it is defined then don't do the check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants