Accuracy/correctness/efficiency compromise for quantity representations and their conversions #599

burnpanck · 2024-07-21T08:49:50Z

burnpanck
Jul 21, 2024

While thinking about how to "best" solve both #580 and #598, I realise that the correct answer will depend on general choices we make within the compromise between correctness, accuracy and assembly size.

Representation of physical quantities

In general, a quantities library like this provides a representation of physical quantities, so that the user can describe physical relations in their code and let the library handle turning that into an implementation. Thus, ideally, the semantics of every operation that are offered by our representations are fully specified by the implied physical relation.

However, there is a compromise to be made in how far the behaviour of that representation may stray from the implied physical relationship, in order to be performant on the hardware. The primary way we let the library user influence that compromise is by giving them the option to choose how to represent quantities. With that, I mean all aspects of how to turn a physical quantity into a bit pattern, so for quantity, that is both the representation type and the unit, and for quantity_point, that also includes the point origin. As C++ programmers, we expect our users to have a reasonable understanding of the behaviour and semantics of the numeric types they put into the representation type slot. From that, they immediately can deduce the range and precision of their chosen representation in terms of the physical quantity.

However, that still leaves open the question on the semantics of operations / physical relations (with regard to accuracy/correctness/efficiency). As library authors, I believe we're obliged to clearly specify those semantics, especially where multiple choices are reasonably possible.

Guarantees

One such area, where compromises need to be made are value conversions.
IMHO, the basic goals that we should be aiming for are the following:

Accuracy: The error introduced by converting a physical quantity between two representations should not introduce error significantly exceeding the larger of the two representation epsilons for that specific quantity (e.g. in case of floating-point representations).
Correctness: The accuracy specification should hold for all values in the domain of the conversion. Here lies part of the compromise to be made; what should be the domain that we guarantee? Considerations:
- One obvious choice is the maximal subdomain of all input representable values that are also representable by the output representation.
- The implementation of a conversion operation (especially when converting between different units or point origins) may become more efficient, the more the domain is reduced, in ways not immediately obvious from the representable domains of the input and output types.
Efficiency: The implementation of the conversion operation should not consume significantly more resources than needed to achieve the guarantees. The challenge here is that efficiency may be strongly hardware dependent, and it's unclear how much assumptions about the hardware should be made. For example, shall we assume the compiler is able to provide reasonable code on hardware without floating-point capability even if we use floating-point types internally?

Obviously, we need to choose under what conditions we want to sacrifice correctness on the full domain for efficiency on the reduced domain, especially where it may be hard for a user to guess what that reduced domain is. In general, I believe we should always strive to reduce surprises (it's a main motivation for this library) and thus try to guarantee correctness on the full reasonable domain. What do you think?

We should always guarantee correctness for the full domain of representable values, even if it means sacrificing some efficiency. The user can still optimise for efficiency by restricting the domain through their choice of representation.
We should always guarantee correctness and efficiency for the full domain of representable values, or fail to compile if that cannot be achieved for a particular operation.
Always guarantee correctness over the full domain at a reduced accuracy that is bounded from below. I.e. guarantee 53 bits of precision as can be had from double.
We should generally guarantee correctness for the full domain of representable values, except if it causes significant overhead over an alternate implementation that is only correct for a subdomain; these cases should be explicitly documented.
We should always guarantee efficiency as long as there remains a non-trivial subdomain where the operation is still correct.

We may also consider introducing customisation points to help let the user influence this compromise.

Compromises

The following sections analyse a few specific representation conversions regarding potential efficiency gains for reduced domains.

Floating-point representations

The intuitive behaviour a C++ developer may expect from a quantity using a floating-point representation type is "almost infinite range" and "constant relative accuracy". If it were "infinite range", then that would imply that the unit neither affects accuracy nor range. The result is that conversions between any two quantities (i.e. different representations) have a clear, intuitive semantics regarding accuracy/correctness, and luckily, the implementation of that is always efficient. While somewhat less intuitive, I believe even for quantity_points (where the point origin may change) there are only minor efficiency gains to be had by sacrificing accuracy. The example here would be when converting from a double representation type to a float representation type while simultaneously changing the point origin; it may be more efficient to first convert to float, then move the origin, but more accurate to do the other way around.

Integer and fixed-point representations for `quantity` (no point origin)

These are a lot more complicated, because range and efficiency are anti-correlated, and range typically is limited. Furthermore, the representation accuracy depends on the chosen unit, and thus even quantity conversions become non-trivial. For integers specifically, you need an intermediate representation that is at least as wide as the sum of the widths of input and output representations. There may be no standard type available with that width. At which point the fully-correct implementation becomes four times more resource-intensive than an implementation that is fully correct on an unspecified and potentially extremely small subdomain, or an implementation that has a guaranteed sub-domain of e.g. half the maximum width and an accuracy matching that half-width.

chiphogg · 2024-07-24T19:25:59Z

chiphogg
Jul 24, 2024
Collaborator

Thanks for posting this topic! There's a lot to dig into here, and it feels like it'll take significant time and effort to pick a solution.

For the case of quantity conversions specifically, I think this Au doc page might be particularly helpful.

One interesting example is the case of applying a (non-trivial) rational conversion factor (i.e., with numerator and denominator both different from 1) to a floating point value. We did find that applying the floating point representation of that rational magnitude, rather than separately multiplying by the numerator and dividing by the denominator, gave a small reduction in accuracy (basically, a neighbouring floating point value). But we argued that this reduction in accuracy was worth it, because users of floating point types have no right to expect more accuracy than this, but we can go from 2 instructions to 1 for the unit conversion.

4 replies

burnpanck Jul 25, 2024
Author

Very nice writeup on that Au doc page. The current implementation here is quite close to that, with the exception of allowing irrational conversion on integer types, executed through a floating-point path. I agree with most design choices there, except with the integer ones for rational and irrational factors:

irrational conversions applied to integers: I don't buy the argument that these should not make sense. Why not? If we consider quantity to be an approximative model to physical quantities, then the integer quantisation is an "approximation artefact", the same as our doubles cannot accurately model mathematical numbers (e.g. division by 3 may cause rounding errors). Then, it is ok to approximate the result of the conversion to the one of the nearby representable values. If that were not ok, then why would truncation be ok for divisions where the input is not exactly divisible?
rational conversions applied to integers: The choice of (val * N) / D risks overflow. The choice of (val / D) * N introduces extra conversion error. In both cases, the extent to which that happens depends on N and D, two values that depend in intricate ways on the units that are involved. Do you know what N and D are for the conversion between long ton and short ton? It happens to be 28:25. For (metric) tonne to short ton? It happens to be 100'000'000:90'718'474. Both of those factors are close to one, yet one of them causes vastly more issues in both implementation strategies. I thus argue that both strategies lead to a bad abstraction of the model of a physical quantity.

Luckily, there is a third strategy that IMHO is better suited, as detailed in this comment to #580: Using a single fixed-point multiplication simultaneously gives better performance, full accuracy and avoids surprises due to intermediate overflow. From the hardware point of view, it doesn't leave integer domain, but it needs to be able to compute the result of a multiplication to the full potential width (the sum of the widths of both operands), an operation that is not explicitly provided by the C/C++ standard. Being very natural, hardware typically provides this operation very efficiently, and compilers are able to infer it from a "cast-to-wide, then multiply" sequence. As a side-note, the pure integer division typically get turned into a fixed-point multiplication by the compiler already. Exactly because of that I would stay with the Au design choice of not turning pure divisions into a fixed-point multiplication at the library level.

For the floating-point conversions, the arguments are basically the same: Implementation as either (val * N) / D or (val / D) * N carry an unnecessary risk of overflow/underflow compared to val * (N/D), though with floating-point types, neither of them are often seen in practice. While both two-step implementations guarantee exact answers for multiples of D by the standard whereas val * (N / D) does not, when the floating-point model in use uses even/odd rounding, the result will in fact be exact.

chiphogg Jul 25, 2024
Collaborator

Thank you!

I admit: I have no idea how the "fixed point" method works. Its results sound very promising and exciting, though. If you have a writeup or a link somewhere, I'd be interested to read it.

The reason I want to prevent applying irrational conversions to integers is that I'm very sensitive to leaving the integer domain and going to floating point --- especially silently. Many people who use integers in the first place are doing so precisely because they're on specialized hardware where floating point performs badly. Integer truncation, on the other hand, is natural and expected for these users. So I'm more OK with integer truncation, especially if our APIs provide good guardrails to prevent it from happening accidentally.

That said, I think the real ideal here is to provide runtime convertibility checks that can perfectly assess each individual value, at runtime, for any loss of precision (aurora-opensource/au#110). Unit conversions are never in the hot path of a well designed program, so the best approach IMO is to check them, every time.

burnpanck Jul 25, 2024
Author

I am against runtime convertibility checks in general, and would rather support them through specialised value types. In my (embedded) use case, we do have unit conversions in the hot path, because the sensors report values in quite arbitrary units e.g. such that they cover the measurement range more or less exactly with 16 bits. We capture these values form the start as quantities or even quantity_points (because the sensor values may in fact be offset from zero).

On the other hand, I argue that rounding/truncation of values during conversion should be allowed, irrespective of the relative fraction of input values that may convert without truncation error (i.e. rational conversions, may have some, irrational ones don't). The question is, should a quantity with integer representation be interpreted as an "exact" physical quantity or an approximation of a physical quantity? In my embedded sensor use-case, it clearly is an approximation. The sensor resolution is usually chosen such that the actual measurement error is larger than the quantisation error, and thereby the quantisation is really just an "implementation detail".

Fixed point arithmetic is nicely explained on wikipedia, but the gist of it is quite simple: Assume you have a binary decimal number expressed using 8 bits before the decimal point and 8 bits after the decimal point; fp8.8. Multiplying that with another number of fp10.6 bits using long multiplication will give you at most 18 bits ahead of the decimal point and 14 bits after the decimal point: fp18.14. When you do long multiplication, you may notice that you can instead ignore the decimal point completely, and just "squeze it in" at the right location after the multiplication. That is fixed-point arithmetic. You keep track of an "imaginary" fixed location of the decimal point during compile-time, and execute everything using integer-arithmetic. The only problem is that numbers "grow", any at some point, you probably want to wrap or overflow. When you multiply two 16 bit integer to obtain a 32 bit integer and cast that back to 16 bit, you discard the upper 16 bits. If you want to cast that fp18.14 number back to fp8.8, you need to discard 10 bits at the MSB end, and 6 bits at the LSB end. Thus, the implementation of the conversion ultimately will turn into the following (val * ((N<<m) / D)) >> m, where ((N<<m)/D) is a pure compile-time conversion of the rational number N/D info a fixed-point format with m bits after the decimal. We don't tell the CPU, and pretend it were an integer, but afterwards, we need to remember that the output will also actually represent a number with m bits after the decimal (assuming val really represented an integer). Thus, to truncate the output back into an integer representation, we need to remove those m bits from the right using a right-shift. If you do the error analysis, you will find that this is the exact result you would have gotten by any other method. Because of that, you will find that the compiler will in fact turn constant divisions of integers into multiplications using this trick (Godbolt), where some hardware even have dedicated instructions that multiply and return the "upper part" of the result, effectively performing the right-shift immediately.

chiphogg Jul 27, 2024
Collaborator

Runtime conversions

Interesting, thanks for sharing your use case. Perhaps I'll need to revise my claim that unit conversions are "never" in the hot path. 🙂 I would still be surprised if the cost of a runtime check made a meaningful and measurable difference to the performance of your program, which is what I mean by "hot path". But if you tell me that you've measured that it would, then I believe you.

I think the claim is still true for the vast majority of programs, even embedded ones. The main thing I want from a units library is to provide the option for runtime checks: that is, I want it to provide a function that can tell me whether a specific unit conversion operation will lose precision, and I want those functions to be just as good as the best functions that somebody could write by hand for that specific unit conversion. That's something I haven't seen in any library except Au, and even there it's only partially complete.

Oh, and of course the other aspect of runtime conversions is what to do if it fails. It can certainly make for more complicated error handling logic (although in a program where this kind of failure is unacceptable, then this complication is unavoidable anyway).

Always permit rounding/truncation

I agree we must always have a way to get this, but I don't think it should be the default. If somebody asks to convert an integer quantity of inches to feet, we should stop them from doing that, unless they change the spelling to confirm that they know the risks and they're OK with them.

Fixed point arithmetic

I got the concept of fixed point, but I didn't understand how it could apply to this case, especially when the denominator isn't an even power of 2. Now I think I get it!

The last stumbling block was to understand how we choose the value of m, which obviously affects the precision of the result. The claim that we could get the exact same result with a single multiplication, as for a multiplication-and-integer-division, seemed like magic to me, and didn't ring true. Now I think I found the "magic": it's that you choose m to be the number of bits in the original operands, and store the intermediate result in a type with twice that width, right? Is the double-width the secret sauce?

If so --- wow, that's really exciting! I was going to say that this wouldn't work beyond the biggest integer type, but then I checked your godbolt link and saw that you actually implemented a double-wide template class to deal with this. This really knocks my socks off! Amazing stuff.

Incidentally --- does this also answer my objection to supporting irrational conversion factors for integer types? I did not want to force users to leave the integer domain. However, if you're using a fixed point representation, then you could compute (at compile time) the closest fixed point representation of that irrational number. Then you could support irrationals without leaving the integer domain! (We'd probably want to always choose the biggest integer type for intermediate multiplication, for precision.) Does this sound right to you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy/correctness/efficiency compromise for quantity representations and their conversions #599

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Accuracy/correctness/efficiency compromise for quantity representations and their conversions #599

burnpanck Jul 21, 2024

Representation of physical quantities

Guarantees

Compromises

Floating-point representations

Integer and fixed-point representations for quantity (no point origin)

Replies: 1 comment · 4 replies

chiphogg Jul 24, 2024 Collaborator

burnpanck Jul 25, 2024 Author

chiphogg Jul 25, 2024 Collaborator

burnpanck Jul 25, 2024 Author

chiphogg Jul 27, 2024 Collaborator

Runtime conversions

Always permit rounding/truncation

Fixed point arithmetic

burnpanck
Jul 21, 2024

Integer and fixed-point representations for `quantity` (no point origin)

Replies: 1 comment 4 replies

chiphogg
Jul 24, 2024
Collaborator

burnpanck Jul 25, 2024
Author

chiphogg Jul 25, 2024
Collaborator

burnpanck Jul 25, 2024
Author

chiphogg Jul 27, 2024
Collaborator