Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Grisu2 algorithm in String::num_scientific to fix serializing #98750

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aaronfranke
Copy link
Member

@aaronfranke aaronfranke commented Nov 2, 2024

We should merge #100414 before this PR.

Supersedes PR #96676, PR #86951, and fixes #78204, fixes #99103, fixes #99763.

This PR replaces the algorithm in String::num_scientific with Grisu2 to serialize numbers with more precision. The implementation was copied from simdjson here: https://github.com/simdjson/simdjson/blob/master/src/to_chars.cpp and adjusted slightly to match the existing behavior of String::num_scientific.

What: Grisu2 is an algorithm for serializing floats in scientific notation, with enough precision to ensure they can be read back exactly, while also having the minimum amount of digits, ensuring compactness and human readability. It uses integer operations and a table of pre-computed powers of ten, so it is extremely fast.

Why: We need to serialize with more precision to ensure that a serialized number can be deserialized into the same number. For example, for the number 123456789, the closest 32-bit float is 123456792. In master this is serialized as 1.23457e8, which becomes 123457000, over 200 off from the closest 32-bit float. With this PR, if a 32-bit float, it will be serialized with 9 digits as 123456790, which can be read back as exactly 123456792. 32-bit floats have 6 reliable digits, but up to 9 are needed to serialize to decimal in order to read back with full precision.

For an example with 64-bit floats, I have 1.234567898765432123456789e30 included in the test cases. The closest 64-bit float is 1.23456789876543218850569440461e30 (differs at the 8 which used to be a 2). This gets serialized as 1.2345678987654322e+30 which is deserialized to exactly 1.23456789876543218850569440461e30. 64-bit floats have 14 reliable digits, but up to 17 are needed to serialize to decimal in order to read back with full precision.

Note that the code in Variant writer for Vector2/Vector3/etc has been adjusted to work with both 32-bit and 64-bit floats, so it will correctly serialize the numbers for builds with either precision level.

Note that the docs have special code that always use the 32-bit version, since we don't need high precision in the docs.

Note that I kept the existing behavior where num_scientific does not have a trailing .0, but the code I grabbed from simdjson included that, so I removed it. It would be easy to add that back in. However I also separately re-added the trailing .0 for the documentation to ensure the docs are generated with .0 like before.

@fire
Copy link
Member

fire commented Nov 2, 2024

Is it worth modifying json to native and json from native?

@aaronfranke
Copy link
Member Author

@fire What do you mean?

@fire
Copy link
Member

fire commented Nov 4, 2024

I was curious why you renamed rtos_fixed to serialize_real. Replacing methods create a lot of patch churn.

@aaronfranke
Copy link
Member Author

@fire I can undo the name change if it's not desired, but I think this is a clearer name.

@fire
Copy link
Member

fire commented Nov 5, 2024

I have no opinion on the name change. It's not that important.

Copy link
Member

@bruvzg bruvzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely in favor of using the same code for float serialization/print, and the implementation looks good. sprintf is too implementation dependent and unreliable.

I was curious why you renamed rtos_fixed to serialize_real.

It's internal method, so doesn't matter. But I like serialize_real more.

core/string/ustring.h Outdated Show resolved Hide resolved
thirdparty/README.md Show resolved Hide resolved
thirdparty/grisu2/godot.patch Outdated Show resolved Hide resolved
@aaronfranke aaronfranke force-pushed the grisu branch 2 times, most recently from dc17bc5 to 850a082 Compare November 6, 2024 12:10
@clayjohn
Copy link
Member

clayjohn commented Nov 6, 2024

I'm definitely in favor of using the same code for float serialization/print, and the implementation looks good. sprintf is too implementation dependent and unreliable.

Should we try to remove sprintf from other places? Notable we still use it in String::num and it causes similar problems there

@arkology
Copy link
Contributor

arkology commented Nov 7, 2024

Does PR solve this issue?
UPD: And maybe this?

@aaronfranke
Copy link
Member Author

@arkology This PR only affects String::num_scientific, it does not change places that are currently using non-scientific numbers. However, now that this function is better, it opens the opportunity to use this in more places in future PRs.

@nikitalita
Copy link
Contributor

This also fixes #99103

@akien-mga akien-mga changed the title Use Grisu2 algorithm in String::num_scientific to fix serializing Use Grisu2 algorithm in String::num_scientific to fix serializing Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants