Accuracy/correctness/efficiency compromise for quantity representations and their conversions #599
Replies: 1 comment 4 replies
-
Thanks for posting this topic! There's a lot to dig into here, and it feels like it'll take significant time and effort to pick a solution. For the case of quantity conversions specifically, I think this Au doc page might be particularly helpful. One interesting example is the case of applying a (non-trivial) rational conversion factor (i.e., with numerator and denominator both different from 1) to a floating point value. We did find that applying the floating point representation of that rational magnitude, rather than separately multiplying by the numerator and dividing by the denominator, gave a small reduction in accuracy (basically, a neighbouring floating point value). But we argued that this reduction in accuracy was worth it, because users of floating point types have no right to expect more accuracy than this, but we can go from 2 instructions to 1 for the unit conversion. |
Beta Was this translation helpful? Give feedback.
-
While thinking about how to "best" solve both #580 and #598, I realise that the correct answer will depend on general choices we make within the compromise between correctness, accuracy and assembly size.
Representation of physical quantities
In general, a quantities library like this provides a representation of physical quantities, so that the user can describe physical relations in their code and let the library handle turning that into an implementation. Thus, ideally, the semantics of every operation that are offered by our representations are fully specified by the implied physical relation.
However, there is a compromise to be made in how far the behaviour of that representation may stray from the implied physical relationship, in order to be performant on the hardware. The primary way we let the library user influence that compromise is by giving them the option to choose how to represent quantities. With that, I mean all aspects of how to turn a physical quantity into a bit pattern, so for
quantity
, that is both the representation type and the unit, and forquantity_point
, that also includes the point origin. As C++ programmers, we expect our users to have a reasonable understanding of the behaviour and semantics of the numeric types they put into the representation type slot. From that, they immediately can deduce the range and precision of their chosen representation in terms of the physical quantity.However, that still leaves open the question on the semantics of operations / physical relations (with regard to accuracy/correctness/efficiency). As library authors, I believe we're obliged to clearly specify those semantics, especially where multiple choices are reasonably possible.
Guarantees
One such area, where compromises need to be made are value conversions.
IMHO, the basic goals that we should be aiming for are the following:
Obviously, we need to choose under what conditions we want to sacrifice correctness on the full domain for efficiency on the reduced domain, especially where it may be hard for a user to guess what that reduced domain is. In general, I believe we should always strive to reduce surprises (it's a main motivation for this library) and thus try to guarantee correctness on the full reasonable domain. What do you think?
double
.We may also consider introducing customisation points to help let the user influence this compromise.
Compromises
The following sections analyse a few specific representation conversions regarding potential efficiency gains for reduced domains.
Floating-point representations
The intuitive behaviour a C++ developer may expect from a quantity using a floating-point representation type is "almost infinite range" and "constant relative accuracy". If it were "infinite range", then that would imply that the unit neither affects accuracy nor range. The result is that conversions between any two
quantities
(i.e. different representations) have a clear, intuitive semantics regarding accuracy/correctness, and luckily, the implementation of that is always efficient. While somewhat less intuitive, I believe even forquantity_points
(where the point origin may change) there are only minor efficiency gains to be had by sacrificing accuracy. The example here would be when converting from adouble
representation type to afloat
representation type while simultaneously changing the point origin; it may be more efficient to first convert to float, then move the origin, but more accurate to do the other way around.Integer and fixed-point representations for
quantity
(no point origin)These are a lot more complicated, because range and efficiency are anti-correlated, and range typically is limited. Furthermore, the representation accuracy depends on the chosen unit, and thus even
quantity
conversions become non-trivial. For integers specifically, you need an intermediate representation that is at least as wide as the sum of the widths of input and output representations. There may be no standard type available with that width. At which point the fully-correct implementation becomes four times more resource-intensive than an implementation that is fully correct on an unspecified and potentially extremely small subdomain, or an implementation that has a guaranteed sub-domain of e.g. half the maximum width and an accuracy matching that half-width.Beta Was this translation helpful? Give feedback.
All reactions