-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating point roundtrip and loss of precision #22
Comments
Please let me know whether the former flag belongs in 'jsonm' (or a higher layer such as 'ezjsonm'), and what the preferred option would be for preserving float accuracy. |
So you really prefer not to do the check for nan or infinity on the floats you encode yourself with jsonm but rather let yojson happily let you input non-validated, possibly null byte holding, strings ? Programmers' priorities will never cease to amaze me.
The problem with this is that most programs will never see the exception anyways until their program fails at some point on production. Which is akin to discover that you are generating ill formed jsonm. I'm not sure which is better but all together it looks rather equivalent to me. (And of course doing the check by default will entail that you do the check twice since the client still needs to do it to decide what it does with these values) |
We only discovered we were generating invalid JSON after about 10 years when we looked at why one of our product's client applications was carrying a patch to the C# JSON parser: it was so that it could parse JSON produced by jsonm, but instead of that team raising a bug with us they worked it around by patching their parser. However rather than duplicating those safety checks in all applications it would be useful to have a common layer to do that (whether jsonm, ezjsonm, or something else), otherwise each application has to rediscover this bug (perhaps after many years like we did).
that is why I'd like to move back to using jsonm. |
Regarding the points you raised in this issue,
|
Thanks, that looks very promising. I'll try this and report back on the jsont bug tracker if I find anything missing. |
I would suggest to have a flag that enforces standard compliant JSON on output and raise an exception if not.
Users that care about performance can keep that flag off, but users that care about correctness can turn that flag on to catch bugs in the application. (and even users who care about performance may want to turn that flag on during tests).
Also the floating point output format should be changed from the default 'string_of_float' to at least "%.17g" (if accuracy and performance is desired), or a dynamic choice betwen %.15g,%.16g and %.17g (shortest that roundtrips), or another algorithm that ensures full accuracy of floating point data is preserved. See also Owl_dataframe shouldn' t use 'string_of_float' owlbarn/owl#640 this is quite a common trap that serialization code falls into...
The following snippet reproduces the issue:
The JSON produced is not spec conformant and cannot be parsed by anything other than jsonm:
There is also a second problem here that outputting a float loses precision and doesn't retain the full range of an IEEE-754 double. That is the fault of OCaml's default 'string_of_float', but any serialization code should use well defined precision instead of relying on the default. "%.17g" should output enough digits to fully preserve floats (albeit they may look a bit "ugly" with more digits than required).
Elsewhere a dynamic choice between "%.15g" and "%.17g" is made to fully preserve the original float (the hexadecimal encoding of floats wouldn't be valid JSON), that might be an alternative.
Yojson by default would do the same for NaN:
But it has a flag to force producing standard conforming JSON (and the application can then do the necessary encoding of NaN/Infinite beforehand):
It also fully preserves all the digits in 'eps'.
See also #12 (comment)
With these changes I may be able to switch back to using jsonm, which I'd still prefer due to its better input validation (for now we had to switch to using yojson to ensure spec compliant output)
The text was updated successfully, but these errors were encountered: