Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ULP requirements for fp16 divide #1278

Open
lakshmih opened this issue Nov 5, 2024 · 8 comments
Open

ULP requirements for fp16 divide #1278

lakshmih opened this issue Nov 5, 2024 · 8 comments

Comments

@lakshmih
Copy link
Contributor

lakshmih commented Nov 5, 2024

Referring to OpenCL C spec on ULP requirements

ULP requirements for single precision divide (x/y) and reciprocal(1.0/x) are ≤ 2.5 ulp

However, for half precision these are defined as needing to be 'correctly rounded'

We would like to propose that these be defined with specific (lower) ULP, following the pattern for other built-ins.
We would further like to set a precision requirement of <= 1 ulp for both of these cases for fp16.

Double precision ULP for these cases also suffer from the same discrepancy (specific ULP for float, correctly rounded for double) so these should be reviewed as well.

@svenvh
Copy link
Member

svenvh commented Nov 6, 2024

As an additional consideration, for fp32 the specification defines the CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT value for the CL_DEVICE_SINGLE_FP_CONFIG query:

CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT - divide and sqrt are correctly rounded as defined by the IEEE754 specification.

In case implementations want to keep advertising a correctly rounded divide for half/double, we could consider extending the CL_DEVICE_HALF_FP_CONFIG and CL_DEVICE_DOUBLE_FP_CONFIG with CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT. That may also require revisiting the sqrt ULP requirements.

@bashbaug
Copy link
Contributor

Discussed in the November 5th teleconference. The decisions we need to make are:

  1. Should we reduce the precision requirements for an fp16 divide? (Leaning towards “yes”.)
  2. If we reduce the precision requirements for an fp16 divide, what should we reduce it to? <=1ULP? <=0.5ULP? Something else?
  3. Do we need a build option like “-cl-fp32-correctly-rounded-divide-sqrt” to get back to a more precise fp16 divide?
  4. Do we want to relax the precision requirements for any other fp16 operations, or is this just an issue with the fp16 divide? Consider 1/x in particular.
  5. Do we want to relax the precision requirements for any fp64 operations?

@kpet
Copy link
Contributor

kpet commented Nov 12, 2024

See related CTS issue: KhronosGroup/OpenCL-CTS#1996

@lakshmih
Copy link
Contributor Author

Agreed on WG call 11/12 to:

@bashbaug
Copy link
Contributor

With our current GPU implementation, we have a 0.5 ULP error with some inputs:

ERROR: divide: 0.500000 ulp error at {0x1.ef4p-4 (0x2fbd), -0x1.cp+13 (0xf300)}
Expected: -0x1.1cp-17  (half 0x808e) 
Actual: -0x1.1ap-17 (half 0x808d) at index: 13227

If I relax the ULP requirements to 0.5 ULP then the test passes:

  1:          divide fp16     ................passed        0.50 @ {-0x1.0e4p-6, 0x1.7p+13}

Tested on an Arc A750 with the command line: test_bruteforce -f -r divide

Our current CPU implementation passes without relaxing the ULP requirements.

@lakshmih
Copy link
Contributor Author

Qualcomm would like to request 1 ULP for divide.

@paulfradgley
Copy link
Contributor

paulfradgley commented Dec 17, 2024

0.5 or 1 ULP are fine for IMG.

lakshmih added a commit to lakshmih/OpenCL-Docs that referenced this issue Dec 20, 2024
…rocal

Update ULP requirements for these builtins to 1.0 as per discussion
on KhronosGroup#1278
@lakshmih
Copy link
Contributor Author

lakshmih commented Jan 7, 2025

As agreed on 1/7/2025, steps towards resolution on this:

  • Qualcomm to update spec PR to restrict to divide and push test PR to update ULP to 1.0 for divide. Once vendors agree that the CTS update is good, review and merge both.
  • Re-enable reciprocal in CTS. Qualcomm can push this as well (patch to re-enable the test). Run and check results and depending, make changes to spec(s) and CTS to update to 1.0. If there are other changes needed to re-enable the test, we might have to figure out how to address those.
  • Wait for mobica to fix sqrt CTS and proceed similarly with spec/test updates based on results on implementations.

bashbaug pushed a commit to KhronosGroup/OpenCL-CTS that referenced this issue Jan 21, 2025
bashbaug added a commit that referenced this issue Jan 21, 2025
…rocal (#1293)

* OpenCL C: Update ULP requirements for half-precision divide and reciprocal

Update ULP requirements for these builtins to 1.0 as per discussion
on #1278

* Restrict update to divide for now

* Update SPIR-V environment spec to set fp-16 divide ULP to 1.0

* relax reciprocal ULP requirement also

---------

Co-authored-by: Ben Ashbaugh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Needs WG discussion
Development

No branches or pull requests

5 participants