Single precision variants #29

dylanede · 2022-07-19T15:00:53Z

What would it take to produce versions of the functions for single precision floating point? Presumably in most cases it means truncating the series and thus the tables of coefficients at lower indexes, but I'm not sure where to begin with adjusting the test cases and finding the new cutoff points. On another note, what guarantees do the existing functions have with respect to precision over their entire input range?

Expander · 2022-07-20T06:54:33Z

Hi @dylanede ,

that should be straight-forward if one would do what you're suggesting, namely truncating the approximations accordingly. (I guess you're asking for single-precision versions for performance reasons, right?)

To answer your questions:

According to the tests the double-precision implementations have a relative precision of 2*std::pow(10.0 , -std::numeric_limits<double>::digits10) = 2e-15 over the whole tested range. This is achieved by mapping the functions to some small interval and using an approximation with sufficient precision there. In this process there should be not loss of precision or so.
For single precision one could aim for 2*std::pow(10.0f , -std::numeric_limits<float>::digits10) = 2e-6.
The tables with the single-precision coefficients should be stored in the code with 9 significant decimal digits.

To begin with one could start with Li2(float):

One could adapt the TEST_CASE("test_real_fixed_values") accordingly.
Add Li2(float) to src/cpp/Li2.{hpp,cpp}.
To implement Li2(float) one could copy and adapt Li2(double) accordingly:
- convert all double precision numbers to single precision, like 0.5 -> 0.5f etc.
- determine the coefficients for the numerator and denominator using the Mathematica script test/math/dilogMinimaxApprox.m. Here one should reduce the number of coefficients in the numerator and denominator (otherwise there would be no performance gain) until one reaches the desired maximum relative error.

I could help with that, if you like. (So far I did not see any application for this, but it seems you have one.)

dylanede · 2022-07-20T14:07:36Z

Thanks for the detailed reply!

I have determined using that script that for the target of 2e-6 relative error, the numbers of coefficients for the numerator and denominator are two and three respectively (giving a relative error of approximately 8e-9). I'll see what I can do with the test cases.

By the way, the main functions I am interested in are the complex versions of Li2, Li3 and Li4, so if/when I submit a PR, it will likely be for those plus their dependencies.

Expander · 2022-07-20T16:50:57Z

Many thanks in advance for preparing a PR!

I have determined using that script that for the target of 2e-6 relative error, the numbers of coefficients for the numerator and denominator are two and three respectively (giving a relative error of approximately 8e-9).

That's pretty good in the sense that the number of coefficients is not very big. (I guess reducing the number of coefficients further would give a relative error of more than ~1e-6, right?)

By the way, the main functions I am interested in are the complex versions of Li2, Li3 and Li4, so if/when I submit a PR, it will likely be for those plus their dependencies.

For the complex versions I've used the helper functions in test/math/polylog.m, which contains all the different expansions in analytic form. But I guess you won't need them because for the complex versions I've used a simple Taylor expansion that you could simply chop off at a sufficient order.

Just out of curiosity, if you don't mind: What exactly is your application? Since you care about performance so much, I guess it you're evaluating these polylogarithms millions of times, right?

dylanede · 2022-07-20T18:07:04Z

That's pretty good in the sense that the number of coefficients is not very big. (I guess reducing the number of coefficients further would give a relative error of more than ~1e-6, right?)

Yes, though I did notice that for a given input number of coefficients, the number that comes out is one more than that for both the numerator and denominator, with the first coefficients being 1. So really it's three and four. The next step down in coefficient count produced a relative error of 4.4e-6.

Just out of curiosity, if you don't mind: What exactly is your application? Since you care about performance so much, I guess it you're evaluating these polylogarithms millions of times, right?

Sure. The application relies heavily on parallel Monte-Carlo sampling of distributions proportional to the Green's functions for Laplace's equation on rectangular regions. These functions turn out to be expressible in terms of logarithms of the Jacobi theta functions, and the integrals needed to perform the sampling of them end up as quickly converging (within two or three terms) series involving complex polylogarithms.

I'm currently using the double precision versions simply adjusted to use floats instead, since double precision currently isn't necessary, but I thought I'd enquire about how much effort would be necessary to optimise the functions for floats, in case I find myself needing more performance. So I can't guarantee that the PR will be soon.

Expander · 2022-07-21T06:12:00Z

Yes, though I did notice that for a given input number of coefficients, the number that comes out is one more than that for both the numerator and denominator, with the first coefficients being 1. So really it's three and four. The next step down in coefficient count produced a relative error of 4.4e-6.

Some time ago I found this old paper [Morris], which gives several rational function approximations with different precision. For double precision (index 0011) with 16.2 decimal digits precision it uses 5 (numerator) and 6 terms (denominator), where one of them is equal to 1.

For single precision, where one may be satisfied with 6.7 decimal digits of precision, Morris uses 2 (numerator) and 2 (denominator) terms (index 0004), where one of them is set to 1. If I understand correctly, this matches your observation.

Sure. The application relies heavily on parallel Monte-Carlo sampling of distributions proportional to the Green's functions for Laplace's equation on rectangular regions. These functions turn out to be expressible in terms of logarithms of the Jacobi theta functions, and the integrals needed to perform the sampling of them end up as quickly converging (within two or three terms) series involving complex polylogarithms.

I'm currently using the double precision versions simply adjusted to use floats instead, since double precision currently isn't necessary, but I thought I'd enquire about how much effort would be necessary to optimise the functions for floats, in case I find myself needing more performance. So I can't guarantee that the PR will be soon.

Very interesting, thanks!

No worries regarding the times scale. :)

Expander · 2023-09-03T11:11:01Z

I've drafted a single-precision version of the real dilogarithm li2(float) in the branch feature-li2-float. On my laptop it is faster by a factor ~1.4. I'll see if I can optimize it further.

Expander added the enhancement label Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single precision variants #29

Single precision variants #29

dylanede commented Jul 19, 2022

Expander commented Jul 20, 2022

dylanede commented Jul 20, 2022 •

edited

Loading

Expander commented Jul 20, 2022

dylanede commented Jul 20, 2022

Expander commented Jul 21, 2022

Expander commented Sep 3, 2023

Single precision variants #29

Single precision variants #29

Comments

dylanede commented Jul 19, 2022

Expander commented Jul 20, 2022

dylanede commented Jul 20, 2022 • edited Loading

Expander commented Jul 20, 2022

dylanede commented Jul 20, 2022

Expander commented Jul 21, 2022

Expander commented Sep 3, 2023

dylanede commented Jul 20, 2022 •

edited

Loading