-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Grisu2 algorithm in String::num_scientific
to fix serializing
#98750
base: master
Are you sure you want to change the base?
Conversation
Is it worth modifying json to native and json from native? |
@fire What do you mean? |
I was curious why you renamed |
@fire I can undo the name change if it's not desired, but I think this is a clearer name. |
I have no opinion on the name change. It's not that important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely in favor of using the same code for float serialization/print, and the implementation looks good. sprintf
is too implementation dependent and unreliable.
I was curious why you renamed rtos_fixed to serialize_real.
It's internal method, so doesn't matter. But I like serialize_real
more.
dc17bc5
to
850a082
Compare
Should we try to remove |
@arkology This PR only affects |
This also fixes #99103 |
String::num_scientific
to fix serializing
9ba5d2e
to
af0f254
Compare
We should merge #100414 before this PR.
Supersedes PR #96676, PR #86951, and fixes #78204, fixes #99103, fixes #99763.
This PR replaces the algorithm in
String::num_scientific
with Grisu2 to serialize numbers with more precision. The implementation was copied from simdjson here: https://github.com/simdjson/simdjson/blob/master/src/to_chars.cpp and adjusted slightly to match the existing behavior ofString::num_scientific
.What: Grisu2 is an algorithm for serializing floats in scientific notation, with enough precision to ensure they can be read back exactly, while also having the minimum amount of digits, ensuring compactness and human readability. It uses integer operations and a table of pre-computed powers of ten, so it is extremely fast.
Why: We need to serialize with more precision to ensure that a serialized number can be deserialized into the same number. For example, for the number
123456789
, the closest 32-bit float is123456792
. In master this is serialized as1.23457e8
, which becomes123457000
, over 200 off from the closest 32-bit float. With this PR, if a 32-bit float, it will be serialized with 9 digits as123456790
, which can be read back as exactly123456792
. 32-bit floats have 6 reliable digits, but up to 9 are needed to serialize to decimal in order to read back with full precision.For an example with 64-bit floats, I have
1.234567898765432123456789e30
included in the test cases. The closest 64-bit float is1.23456789876543218850569440461e30
(differs at the 8 which used to be a 2). This gets serialized as1.2345678987654322e+30
which is deserialized to exactly1.23456789876543218850569440461e30
. 64-bit floats have 14 reliable digits, but up to 17 are needed to serialize to decimal in order to read back with full precision.Note that the code in Variant writer for Vector2/Vector3/etc has been adjusted to work with both 32-bit and 64-bit floats, so it will correctly serialize the numbers for builds with either precision level.
Note that the docs have special code that always use the 32-bit version, since we don't need high precision in the docs.
Note that I kept the existing behavior where
num_scientific
does not have a trailing.0
, but the code I grabbed from simdjson included that, so I removed it. It would be easy to add that back in. However I also separately re-added the trailing.0
for the documentation to ensure the docs are generated with.0
like before.