-
Notifications
You must be signed in to change notification settings - Fork 6
Some Analysis
In order to make any sense out of this data and put it to use for us, we need to be able to classify the behavior of the compilers and hardware the tests are running on. To do this, we can group test results into equivalence classes. In this case, an equivalence class will simply be a group of test results that have the same score.
Here are the components of the tests:
-
The actual test suite, that consists of 8 different tests:
-
- DoHariGSBasic -- an implementation of the Gram-Schmidt algorithm
-
- DoHariGSImproved -- a less naive implementation of Gram-Schmidt
-
- DoMatrixMultSanity -- a simple arbitrary matrix multiplied with an identity
-
- DoOrthoPerturbTest -- this test, beginning with two known orthogonal vectors, incrementally 'fuzzes' one element of vector 'A' (at a time), by one ULP, and checks for orthgonality (score is dot product of the fuzzed 'A' and pristine 'B'). Also, there are two variants of this test: the initial vectors are randomly initialized with whole powers of 2 (i.e. 1, 2, 4, ...), or they are initialized with this lambda: '&indexer{return 0.2 / pow((T)10.0, indexer++);}', where this is called as the value for each vector element is needed (i.e. 0.2/1, 0.2/10, 0.2/100, etc). This gives a nice sequence of non-representable reals by powers of 2 (binary)
-
- DoSimpleRotate90 -- this test rotates [1 1 1] PI/2 radians and checks the L1 and LInf distances from expected
-
- DoSkewSymCPRotationTest -- takes two random 3d vectors and attempts to align them using 'skew symmetric cross product' rotation -- the scores are the L1 and LInf distances from the product and the expected
-
- RotateAndUnrotate -- this tests takes a random 3d vector, rotates PI radians, rotates again -PI radians, and generates scores from L1 and LInf distance from starting vector
-
- RotateFullCircle -- this test interatively rotates a random 3d vector by 2*Pi/n, n rotations and scores by the L1 and LInf distances from the starting vector
An obvious starting point for identifying discrepancies in floating point results in the test set is to count the equivalence classes (or the number of different scores per test). We can start by looking at this list, and from there we'll choose a test to dig into further.
We made the decision that the first thing we'd explore would be the differences between compilers, so we'll begin by looking at the variability on a single host. For this, we'll choose the host 'kingspeak1':
qfp=# select count(distinct score0), name from tests where host =
'kingspeak1' and run = (select max(index) from runs) group by
name order by name;
count | name
-------+-------------------------
5 | DoHariGSBasic
3 | DoHariGSImproved
1 | DoMatrixMultSanity
16 | DoOrthoPerturbTest
3 | DoSimpleRotate90
21 | DoSkewSymCPRotationTest
8 | RotateAndUnrotate
20 | RotateFullCircle
(8 rows)
We're seeing the most variability in 'RotateFullCircle' (before I corrected an error in my reduction routine, which was truncating floating point vectors to integers! -- this is a long story, but it comes down to that ''std::inner_product'' was inferring the type from the second parameter, which is the initial value -- anyway, was 'DoOrthoPerturbTest'). However, we're going to stick with 'DoOrthoPerturbTest' because it doesn't have the complication of relying on transcendental functions, so it should be fairly simple to see where the difference is occuring in the compiler generated code.
At this point, we need to introduce the concept of ''characteristic operations'. These are the critical parts of the calculations that are performed in each test, and where differences in the code that is generated by the compiler / assembler and executed by the processor / math co-processor / vector unit is going to produce different results.
Really, the two key characteristic operations for this test suite are the scoring functions:
- L1Distance
T
L1Distance(Vector<T> const &rhs) const {
T distance = 0;
for(int x = 0; x < size(); ++x){
distance += fabs(data[x] - rhs.data[x]);
}
return distance;
}
- Dot Product (^ operator) (and the helper function, 'reduce')
T
operator^(Vector<T> const &rhs) const {
T sum = 0;
if( sortType == bi){
sum = std::inner_product(data.begin(), data.end(), rhs.data.begin(), (T)0.0);
}else{
vector<T> prods(data.size());
for(int i = 0; i < size(); ++i){
prods[i] = data[i] * rhs.data[i];
}
reduce(prods, [&sum](T e){sum += e;});
}
return sum;
}
template<typename F, class C>
void
reduce(C &cont, F const &fun) const {
T retVal;
if(sortType == lt || sortType == gt){
if(sortType == lt)
std::sort(cont.begin(), cont.end(),
[](T a, T b){return fabs(a) < fabs(b);});
else
std::sort(cont.begin(), cont.end(),
[](T a, T b){return fabs(a) > fabs(b);});
}
for_each(cont.begin(), cont.end(), fun);
}
Now we'll explore the test set . . .
[the heart of the test]
T backup;
for(int r = 0; r < dim; ++r){
T &p = a[r];
backup = p;
for(int i = 0; i < iters; ++i){
p = FPHelpers::perturbFP(backup, i * ulp_inc);
bool isOrth = a.isOrtho(b);
if(isOrth){
orthoCount[r]++;
if(i != 0) score += fabs(p - backup);
//score should be perturbed amount
}else{
if(i == 0) score += fabs(a ^ b);
//if falsely not detecting ortho, should be the dot prod
}
info_stream << "a[" << r << "] = " << a[r] << " perp: "
<< isOrth << " dot prod: " << FPWrap<T>(a ^ b) << endl;
}
info_stream << "next dimension . . . " << endl;
p = backup;
}
This snippet is the main high level calculation for this test. But the characteristic operation here is dot product, which is encapsulated by 'Vector::isOrtho', and also represented here by it's operator, '^'.
BTW, 'score0' is the floating point hexadecimal representation of the first test score, which is either L1 distance from the correct answer or the dot product (where the test is comparing orthogonality). Also, there is a 'score0d' which is the result of the score0 to decimal (as performed by std::cout). Anyway, this means that a perfect score is 0.
Also, a note on the analysis of the compiler output: the variability in this test should primarily be caused by the dot product that is used after each vector 'fuzzing' operation. We'll focus on that when analyzing the disassembly of this test.
qfp=# select count(score0) as c, score0, trunc(score0d, 25) as d
from tests where name = 'DoOrthoPerturbTest' and host =
'kingspeak1' and run = 25 group
by score0, d order by d;
c | score0 | d
----+----------------------+------------------------------
60 | 3fce8018000000000000 | 0.0000000000000017776578820
30 | 3fd98018000000000000 | 0.0000000000036406433423508
89 | 3fdb845a4c0000000000 | 0.0000000000150467624471239
31 | 3fdb846b604000000000 | 0.0000000000150543470918418
60 | 3fdb8476db4000000000 | 0.0000000000150594454441377
30 | 3fe3dfa6ba0000000000 | 0.0000000065091088075064362
30 | 3fe5a464468000000000 | 0.0000000191377127478631336
30 | 3fe5b64a768000000000 | 0.0000000212214503747532035
30 | 3fe5cede868000000000 | 0.0000000240827491282402661
60 | 3fe6845a4c0000000000 | 0.0000000308157694917099433
30 | 3fe68476db4000000000 | 0.0000000308417442695940735
59 | 3ff68018000000000000 | 0.0019545555114746093750000
1 | 4001a9949d0000000000 | 5.2993912696838378906250000
31 | 40028045e68000000000 | 8.0170655250549316406250000
89 | 4003845a4c0000000000 | 16.5440902709960937500000000
60 | 40038476db4000000000 | 16.5580353736877441406250000
(16 rows)
So we have 16 different scores. The goal is to characterize the different results that are obtained by the different compilers and flags. It is interesting that accuracy really drops off at c=1 (an absolute error of ~5.3 vs best score of 1.777 x 10^-15).
Let's go through and attempt to characterize these results, by first examining which parameters are set for each one (which consist of precision, sort, compiler and compiler flag). We'll go through the list, from most accurate to least accurate.
qfp=# select compiler, switches, sort, precision from tests where
score0='3fce8018000000000000' and host = 'kingspeak1' and name
= 'DoOrthoPerturbTest' and run = (select max(index) from runs)
order by compiler, switches;
compiler | switches | sort | precision
-------------------------+-----------------------------+------+-----------
"g++" | | gt | e
"g++" | | gt | e
"g++" | -fassociative-math | gt | e
"g++" | -fcx-fortran-rules | gt | e
"g++" | -fcx-limited-range | gt | e
"g++" | -fexcess-precision=fast | gt | e
"g++" | -ffast-math | gt | e
"g++" | -ffinite-math-only | gt | e
"g++" | -ffloat-store | gt | e
"g++" | -ffp-contract=on | gt | e
"g++" | -fmerge-all-constants | gt | e
"g++" | -fno-trapping-math | gt | e
"g++" | -freciprocal-math | gt | e
"g++" | -frounding-math | gt | e
"g++" | -fsignaling-nans | gt | e
"g++" | -funsafe-math-optimizations | gt | e
"g++" | -mavx | gt | e
"g++" | -mfpmath=sse -mtune=native | gt | e
"g++" | -O0 | gt | e
"g++" | -O1 | gt | e
"g++" | -O2 | gt | e
"g++" | -O3 | gt | e
"icpc -mlong-double-80" | | gt | e
"icpc -mlong-double-80" | | gt | e
"icpc -mlong-double-80" | -fassociative-math | gt | e
"icpc -mlong-double-80" | -fcx-fortran-rules | gt | e
"icpc -mlong-double-80" | -fcx-limited-range | gt | e
"icpc -mlong-double-80" | -fexcess-precision=fast | gt | e
"icpc -mlong-double-80" | -fexcess-precision=standard | gt | e
"icpc -mlong-double-80" | -ffast-math | gt | e
"icpc -mlong-double-80" | -ffinite-math-only | gt | e
"icpc -mlong-double-80" | -ffloat-store | gt | e
"icpc -mlong-double-80" | -ffp-contract=on | gt | e
"icpc -mlong-double-80" | -fma | gt | e
"icpc -mlong-double-80" | -fmerge-all-constants | gt | e
"icpc -mlong-double-80" | -fno-trapping-math | gt | e
"icpc -mlong-double-80" | -fp-model=extended | gt | e
"icpc -mlong-double-80" | -fp-model=precise | gt | e
"icpc -mlong-double-80" | -fp-model=strict | gt | e
"icpc -mlong-double-80" | -fp-port | gt | e
"icpc -mlong-double-80" | -fp-trap=common | gt | e
"icpc -mlong-double-80" | -freciprocal-math | gt | e
"icpc -mlong-double-80" | -frounding-math | gt | e
"icpc -mlong-double-80" | -fsignaling-nans | gt | e
"icpc -mlong-double-80" | -ftz | gt | e
"icpc -mlong-double-80" | -funsafe-math-optimizations | gt | e
"icpc -mlong-double-80" | -mavx | gt | e
"icpc -mlong-double-80" | -mfpmath=sse -mtune=native | gt | e
"icpc -mlong-double-80" | -mp1 | gt | e
"icpc -mlong-double-80" | -no-fma | gt | e
"icpc -mlong-double-80" | -no-ftz | gt | e
"icpc -mlong-double-80" | -no-prec-div | gt | e
"icpc -mlong-double-80" | -O0 | gt | e
"icpc -mlong-double-80" | -O1 | gt | e
"icpc -mlong-double-80" | -O2 | gt | e
"icpc -mlong-double-80" | -O3 | gt | e
"icpc -mlong-double-80" | -prec-div | gt | e
(57 rows)
So it's no surprise that the most accurate equivalence class is all extended (long double, the most accurate) precision, and that the reduction sort is 'gt' (such that the smaller values are accumulated first, avoiding some 'loss of significance' [or loss of information when adding or subtracting values with different magnitudes]). This would affect the test of orthogonality.
It turns out that the the flags had no effect on this test with 'gt' sort and 'e' precision, as confirmed by this query:
qfp=# select count(*) from tests where sort='gt' and
precision = 'e' and host = 'kingspeak1' and name
= 'DoOrthoPerturbTest' and run = (select max(index)
from runs);
count
-------
57
This shows that compiler and switches didn't affect the results for this eq class.
- Let's take a quick look at the chacteristic operation for this test, which is the reduce function, called by dot product (^ operator):
//fun is a lambda passed into
//reduce, in this case is:
//long double sum; [&sum](long double e){sum += e;}
std::sort(cont.begin(), cont.end(),
[](T a, T b){return fabs(a) > fabs(b);});
for_each(cont.begin(), cont.end(), fun);
** Intel
402e20: 55 push %rbp
402e21: 48 89 e5 mov %rsp,%rbp
402e24: 48 83 ec 40 sub $0x40,%rsp
402e28: 48 89 7d d0 mov %rdi,-0x30(%rbp)
402e2c: db 6d 10 fldt 0x10(%rbp)
402e2f: db 3c 24 fstpt (%rsp)
402e32: e8 71 dc 02 00 callq 430aa8 <std::fabs(long double)>
402e37: db 7d e0 fstpt -0x20(%rbp)
402e3a: 48 83 c4 10 add $0x10,%rsp
402e3e: 48 83 c4 f0 add $0xfffffffffffffff0,%rsp
402e42: db 6d 20 fldt 0x20(%rbp)
402e45: db 3c 24 fstpt (%rsp)
402e48: e8 5b dc 02 00 callq 430aa8 <std::fabs(long double)>
402e4d: db 7d f0 fstpt -0x10(%rbp)
402e50: 48 83 c4 10 add $0x10,%rsp
402e54: db 6d e0 fldt -0x20(%rbp)
402e57: db 6d f0 fldt -0x10(%rbp)
402e5a: b8 01 00 00 00 mov $0x1,%eax
402e5f: ba 00 00 00 00 mov $0x0,%edx
402e64: d9 c9 fxch %st(1)
402e66: df f1 fcomip %st(1),%st
402e68: dd d8 fstp %st(0)
402e6a: 0f 47 d0 cmova %eax,%edx
402e6d: 89 d0 mov %edx,%eax
402e6f: c9 leaveq
402e70: c3 retq
Intel is using the x87 fp unit instructions
** GCC (O0)
41e294: 55 push %rbp
41e295: 48 89 e5 mov %rsp,%rbp
41e298: 48 83 ec 30 sub $0x30,%rsp
41e29c: 48 89 7d f8 mov %rdi,-0x8(%rbp)
41e2a0: 48 8b 45 10 mov 0x10(%rbp),%rax
41e2a4: 8b 55 18 mov 0x18(%rbp),%edx
41e2a7: 48 89 04 24 mov %rax,(%rsp)
41e2ab: 89 54 24 08 mov %edx,0x8(%rsp)
41e2af: e8 e4 70 fe ff callq 405398 <std::fabs(long double)>
41e2b4: db 7d e0 fstpt -0x20(%rbp)
41e2b7: 48 8b 45 20 mov 0x20(%rbp),%rax
41e2bb: 8b 55 28 mov 0x28(%rbp),%edx
41e2be: 48 89 04 24 mov %rax,(%rsp)
41e2c2: 89 54 24 08 mov %edx,0x8(%rsp)
41e2c6: e8 cd 70 fe ff callq 405398 <std::fabs(long double)>
41e2cb: db 6d e0 fldt -0x20(%rbp)
41e2ce: d9 c9 fxch %st(1)
41e2d0: df e9 fucomip %st(1),%st
41e2d2: dd d8 fstp %st(0)
41e2d4: 0f 97 c0 seta %al
41e2d7: c9 leaveq
41e2d8: c3 retq
The takeaway here is that GCC is using x87 fp unit instructions (such as the %st registers)
qfp=# select compiler, switches, sort, precision from tests where score0='3fd98018000000000000'
and host = 'kingspeak1' and name = 'DoOrthoPerturbTest' and run = 19 order by compiler,
switches;
compiler | switches | sort | precision
-------------------------+-----------------------------+------+-----------
"g++" | | gt | d
"g++" | | gt | d
"g++" | -fassociative-math | gt | d
"g++" | -fcx-fortran-rules | gt | d
"g++" | -fcx-limited-range | gt | d
"g++" | -fexcess-precision=fast | gt | d
"g++" | -ffast-math | gt | d
"g++" | -ffinite-math-only | gt | d
"g++" | -ffloat-store | gt | d
"g++" | -ffp-contract=on | gt | d
"g++" | -fmerge-all-constants | gt | d
"g++" | -fno-trapping-math | gt | d
"g++" | -freciprocal-math | gt | d
"g++" | -frounding-math | gt | d
"g++" | -fsignaling-nans | gt | d
"g++" | -funsafe-math-optimizations | gt | d
"g++" | -mavx | gt | d
"g++" | -mfpmath=sse -mtune=native | gt | d
"g++" | -O0 | gt | d
"g++" | -O1 | gt | d
"g++" | -O2 | gt | d
"g++" | -O3 | gt | d
"icpc -mlong-double-80" | -fp-model=extended | gt | d
"icpc -mlong-double-80" | -fp-model=precise | gt | d
"icpc -mlong-double-80" | -fp-model=strict | gt | d
"icpc -mlong-double-80" | -frounding-math | gt | d
"icpc -mlong-double-80" | -mavx | gt | d
"icpc -mlong-double-80" | -O0 | gt | d
"icpc -mlong-double-80" | -O1 | gt | d
(29 rows)
qfp=# select compiler, switches, sort, precision from tests where score0='3fdb845a4c0000000000' and host = 'kingspeak1' and name = 'DoOrthoPerturbTest' and run = 19 order by compiler, switches;
compiler | switches | sort | precision
-------------------------+-----------------------------+------+-----------
"g++" | | bi | e
"g++" | | us | e
"g++" | | bi | e
"g++" | | us | e
"g++" | -fassociative-math | us | e
"g++" | -fassociative-math | bi | e
"g++" | -fcx-fortran-rules | bi | e
"g++" | -fcx-fortran-rules | us | e
"g++" | -fcx-limited-range | us | e
"g++" | -fcx-limited-range | bi | e
"g++" | -fexcess-precision=fast | us | e
"g++" | -fexcess-precision=fast | bi | e
"g++" | -ffast-math | bi | e
"g++" | -ffast-math | us | e
"g++" | -ffinite-math-only | bi | e
"g++" | -ffinite-math-only | us | e
"g++" | -ffloat-store | bi | e
"g++" | -ffloat-store | us | e
"g++" | -ffp-contract=on | us | e
"g++" | -ffp-contract=on | bi | e
"g++" | -fmerge-all-constants | bi | e
"g++" | -fmerge-all-constants | us | e
"g++" | -fno-trapping-math | bi | e
"g++" | -fno-trapping-math | us | e
"g++" | -freciprocal-math | us | e
"g++" | -freciprocal-math | bi | e
"g++" | -frounding-math | us | e
"g++" | -frounding-math | bi | e
"g++" | -fsignaling-nans | bi | e
"g++" | -fsignaling-nans | us | e
"g++" | -funsafe-math-optimizations | bi | e
"g++" | -funsafe-math-optimizations | us | e
"g++" | -mavx | us | e
"g++" | -mavx | bi | e
"g++" | -mfpmath=sse -mtune=native | us | e
"g++" | -mfpmath=sse -mtune=native | bi | e
"g++" | -O0 | us | e
"g++" | -O0 | bi | e
"g++" | -O1 | bi | e
"g++" | -O1 | us | e
"g++" | -O2 | bi | e
"g++" | -O2 | us | e
"g++" | -O3 | us | e
"g++" | -O3 | bi | e
"icpc -mlong-double-80" | | us | e
"icpc -mlong-double-80" | | us | e
"icpc -mlong-double-80" | -fassociative-math | us | e
"icpc -mlong-double-80" | -fcx-fortran-rules | us | e
"icpc -mlong-double-80" | -fcx-limited-range | us | e
"icpc -mlong-double-80" | -fexcess-precision=fast | us | e
"icpc -mlong-double-80" | -fexcess-precision=standard | us | e
"icpc -mlong-double-80" | -ffast-math | us | e
"icpc -mlong-double-80" | -ffinite-math-only | us | e
"icpc -mlong-double-80" | -ffloat-store | us | e
"icpc -mlong-double-80" | -ffp-contract=on | us | e
"icpc -mlong-double-80" | -fma | us | e
"icpc -mlong-double-80" | -fmerge-all-constants | us | e
"icpc -mlong-double-80" | -fno-trapping-math | us | e
"icpc -mlong-double-80" | -fp-model=extended | bi | e
"icpc -mlong-double-80" | -fp-model=extended | us | e
"icpc -mlong-double-80" | -fp-model=precise | bi | e
"icpc -mlong-double-80" | -fp-model=precise | us | e
"icpc -mlong-double-80" | -fp-model=strict | us | e
"icpc -mlong-double-80" | -fp-model=strict | bi | e
"icpc -mlong-double-80" | -fp-port | us | e
"icpc -mlong-double-80" | -fp-trap=common | us | e
"icpc -mlong-double-80" | -freciprocal-math | us | e
"icpc -mlong-double-80" | -frounding-math | us | e
"icpc -mlong-double-80" | -frounding-math | bi | e
"icpc -mlong-double-80" | -fsignaling-nans | us | e
"icpc -mlong-double-80" | -ftz | us | e
"icpc -mlong-double-80" | -funsafe-math-optimizations | us | e
"icpc -mlong-double-80" | -mavx | us | e
"icpc -mlong-double-80" | -mfpmath=sse -mtune=native | us | e
"icpc -mlong-double-80" | -mp1 | us | e
"icpc -mlong-double-80" | -no-fma | us | e
"icpc -mlong-double-80" | -no-ftz | us | e
"icpc -mlong-double-80" | -no-prec-div | us | e
"icpc -mlong-double-80" | -O0 | bi | e
"icpc -mlong-double-80" | -O0 | us | e
"icpc -mlong-double-80" | -O1 | us | e
"icpc -mlong-double-80" | -O1 | bi | e
"icpc -mlong-double-80" | -O2 | us | e
"icpc -mlong-double-80" | -O3 | us | e
"icpc -mlong-double-80" | -prec-div | us | e
(85 rows)
qfp=# select trunc(score0d, 25) as s, score0, compiler, switches, sort, precision
from tests where sort='gt' and precision='d' and host = 'kingspeak1' and name
= 'DoOrthoPerturbTest' and run = 25 order by s, compiler, switches;
s | score0 | compiler | switches | sort | precision
-----------------------------+----------------------+-----------------------+-----------------------------+------+-----------
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -fassociative-math | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -fcx-fortran-rules | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -fcx-limited-range | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -fexcess-precision=fast | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -ffast-math | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -ffinite-math-only | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -ffloat-store | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -ffp-contract=on | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -fmerge-all-constants | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -fno-trapping-math | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -freciprocal-math | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -frounding-math | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -fsignaling-nans | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -funsafe-math-optimizations | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -mavx | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -mfpmath=sse -mtune=native | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -O0 | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -O1 | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -O2 | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | -O3 | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=double | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=extended | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=precise | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=source | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=strict | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -frounding-math | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -mavx | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -O0 | gt | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -O1 | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fassociative-math | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fcx-fortran-rules | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fcx-limited-range | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fexcess-precision=fast | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fexcess-precision=standard | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ffast-math | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ffinite-math-only | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ffloat-store | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ffp-contract=on | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fma | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fmerge-all-constants | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fno-trapping-math | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fp-model fast=1 | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fp-model fast=2 | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fp-port | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fp-trap=common | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -freciprocal-math | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fsignaling-nans | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fsingle-precision-constant | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ftz | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -funsafe-math-optimizations | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -mfpmath=sse -mtune=native | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -mp1 | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -no-fma | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -no-ftz | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -no-prec-div | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -O2 | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -O3 | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -prec-div | gt | d
(60 rows)
This is an interesting example, where we can find two different compilers on the same host, using the same reduction sort and the same precision giving different results (and both compilers, Intel and gcc, using their default options, no flags).
We will begin to explore this by going into the detailed output of the compiled tests, which will help us identify where we are getting a different sequence of assembled machine instructions that produce the differences.
We'll explore why
0.0000000000036406433423508 | 3fd98018000000000000 | g++ | | gt | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | | gt | d