Use Grisu2 algorithm in `String::num_scientific` to fix serializing #98750

aaronfranke · 2024-11-02T04:11:06Z

We should merge #100414 before this PR.

Supersedes PR #96676, PR #86951, and fixes #78204, fixes #99103, fixes #99763.

This PR replaces the algorithm in String::num_scientific with Grisu2 to serialize numbers with more precision. The implementation was copied from simdjson here: https://github.com/simdjson/simdjson/blob/master/src/to_chars.cpp and adjusted slightly to match the existing behavior of String::num_scientific.

What: Grisu2 is an algorithm for serializing floats in scientific notation, with enough precision to ensure they can be read back exactly, while also having the minimum amount of digits, ensuring compactness and human readability. It uses integer operations and a table of pre-computed powers of ten, so it is extremely fast.

Why: We need to serialize with more precision to ensure that a serialized number can be deserialized into the same number. For example, for the number 123456789, the closest 32-bit float is 123456792. In master this is serialized as 1.23457e8, which becomes 123457000, over 200 off from the closest 32-bit float. With this PR, if a 32-bit float, it will be serialized with 9 digits as 123456790, which can be read back as exactly 123456792. 32-bit floats have 6 reliable digits, but up to 9 are needed to serialize to decimal in order to read back with full precision.

For an example with 64-bit floats, I have 1.234567898765432123456789e30 included in the test cases. The closest 64-bit float is 1.23456789876543218850569440461e30 (differs at the 8 which used to be a 2). This gets serialized as 1.2345678987654322e+30 which is deserialized to exactly 1.23456789876543218850569440461e30. 64-bit floats have 14 reliable digits, but up to 17 are needed to serialize to decimal in order to read back with full precision.

Note that the code in Variant writer for Vector2/Vector3/etc has been adjusted to work with both 32-bit and 64-bit floats, so it will correctly serialize the numbers for builds with either precision level.

Note that the docs have special code that always use the 32-bit version, since we don't need high precision in the docs.

Note that I kept the existing behavior where num_scientific does not have a trailing .0, but the code I grabbed from simdjson included that, so I removed it. It would be easy to add that back in. However I also separately re-added the trailing .0 for the documentation to ensure the docs are generated with .0 like before.

fire · 2024-11-02T04:17:43Z

Is it worth modifying json to native and json from native?

aaronfranke · 2024-11-02T05:17:49Z

@fire What do you mean?

fire · 2024-11-04T22:35:57Z

I was curious why you renamed rtos_fixed to serialize_real. Replacing methods create a lot of patch churn.

aaronfranke · 2024-11-05T00:37:02Z

@fire I can undo the name change if it's not desired, but I think this is a clearer name.

fire · 2024-11-05T01:25:02Z

I have no opinion on the name change. It's not that important.

bruvzg

I'm definitely in favor of using the same code for float serialization/print, and the implementation looks good. sprintf is too implementation dependent and unreliable.

I was curious why you renamed rtos_fixed to serialize_real.

It's internal method, so doesn't matter. But I like serialize_real more.

core/string/ustring.h

thirdparty/README.md

thirdparty/grisu2/godot.patch

clayjohn · 2024-11-06T22:23:09Z

I'm definitely in favor of using the same code for float serialization/print, and the implementation looks good. sprintf is too implementation dependent and unreliable.

Should we try to remove sprintf from other places? Notable we still use it in String::num and it causes similar problems there

arkology · 2024-11-07T06:06:16Z

Does PR solve this issue?
UPD: And maybe this?

aaronfranke · 2024-11-07T06:28:30Z

@arkology This PR only affects String::num_scientific, it does not change places that are currently using non-scientific numbers. However, now that this function is better, it opens the opportunity to use this in more places in future PRs.

nikitalita · 2024-11-12T14:10:23Z

This also fixes #99103

aaronfranke requested review from a team as code owners November 2, 2024 04:11

aaronfranke added this to the 4.4 milestone Nov 2, 2024

aaronfranke added bug topic:core labels Nov 2, 2024

This was referenced Nov 2, 2024

var_to_str rounds floats, losing massive precision in the process #78204

Open

Add digits argument to String::num_scientific and fix serializing #96676

Closed

aaronfranke mentioned this pull request Nov 2, 2024

Prevent String::num_scientific from giving different precision levels depending on compiler #86951

Closed

bruvzg reviewed Nov 6, 2024

View reviewed changes

core/string/ustring.h Outdated Show resolved Hide resolved

thirdparty/README.md Show resolved Hide resolved

thirdparty/grisu2/godot.patch Outdated Show resolved Hide resolved

aaronfranke force-pushed the grisu branch 2 times, most recently from dc17bc5 to 850a082 Compare November 6, 2024 12:10

clayjohn mentioned this pull request Nov 12, 2024

Loss of float precision when using save as on a text resource or scene #99103

Open

akien-mga changed the title ~~Use Grisu2 algorithm in String::num_scientific to fix serializing~~ Use Grisu2 algorithm in String::num_scientific to fix serializing Nov 13, 2024

aaronfranke force-pushed the grisu branch from 850a082 to 03153ce Compare November 14, 2024 09:57

This was referenced Dec 5, 2024

Fix String.num_scientific() #100043

Closed

String.num_scientific() does not work for floats not big or loses significant digits #99763

Open

Behavior of String::num causes parse error for large float fields in Godot inspector #93768

Open

aaronfranke force-pushed the grisu branch from 03153ce to 1d8967f Compare December 5, 2024 15:14

aaronfranke mentioned this pull request Dec 14, 2024

Capitalize INF, -INF, and NAN in serializing and allow in Range #100414

Open

aaronfranke force-pushed the grisu branch 2 times, most recently from 9ba5d2e to af0f254 Compare December 18, 2024 17:14

Use Grisu2 algorithm in String::num_scientific to fix serializing

c514c64

aaronfranke force-pushed the grisu branch from af0f254 to c514c64 Compare December 18, 2024 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Grisu2 algorithm in `String::num_scientific` to fix serializing #98750

Use Grisu2 algorithm in `String::num_scientific` to fix serializing #98750

aaronfranke commented Nov 2, 2024 •

edited

Loading

fire commented Nov 2, 2024

aaronfranke commented Nov 2, 2024

fire commented Nov 4, 2024

aaronfranke commented Nov 5, 2024

fire commented Nov 5, 2024

bruvzg left a comment

clayjohn commented Nov 6, 2024

arkology commented Nov 7, 2024 •

edited

Loading

aaronfranke commented Nov 7, 2024

nikitalita commented Nov 12, 2024

Use Grisu2 algorithm in String::num_scientific to fix serializing #98750

Are you sure you want to change the base?

Use Grisu2 algorithm in String::num_scientific to fix serializing #98750

Conversation

aaronfranke commented Nov 2, 2024 • edited Loading

fire commented Nov 2, 2024

aaronfranke commented Nov 2, 2024

fire commented Nov 4, 2024

aaronfranke commented Nov 5, 2024

fire commented Nov 5, 2024

bruvzg left a comment

Choose a reason for hiding this comment

clayjohn commented Nov 6, 2024

arkology commented Nov 7, 2024 • edited Loading

aaronfranke commented Nov 7, 2024

nikitalita commented Nov 12, 2024

Use Grisu2 algorithm in `String::num_scientific` to fix serializing #98750

Use Grisu2 algorithm in `String::num_scientific` to fix serializing #98750

aaronfranke commented Nov 2, 2024 •

edited

Loading

arkology commented Nov 7, 2024 •

edited

Loading