WIP: Add e8m0 datatype #2

balancap · 2024-09-12T09:10:54Z

No description provided.

Adding the OCP MX scale format `E8M0`, which has the following properties: * Unsigned format; * 8 exponent bits; * Exponent range from -127 to 127; * No zero and infinity; * Single NaN value (0xFF); `ml_dtypes` `float8_base` C++ class is extended to support floating point formats which are unsigned and with no zero (i.e. additional `kIsSigned` and `kHasZero` Traits properties). Base on these traits, `float8_e8m0_fnu` has been implemented using the existing functionalities (convert, unary/binary ops, ...). Float8 Python unit tests have been extended to be able to cover unsigned floating point formats.

balancap force-pushed the add-e8m0-datatype branch 2 times, most recently from 2953430 to b53f481 Compare September 12, 2024 15:54

balancap force-pushed the add-e8m0-datatype branch from b53f481 to b6d3659 Compare September 12, 2024 16:41

balancap merged commit d581b6f into main Sep 13, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add e8m0 datatype #2

WIP: Add e8m0 datatype #2

balancap commented Sep 12, 2024

WIP: Add e8m0 datatype #2

WIP: Add e8m0 datatype #2

Conversation

balancap commented Sep 12, 2024