Skip to content

Some Analysis

Geof23 edited this page Nov 5, 2015 · 23 revisions

Digging into the results

In order to make any sense out of this data and put it to use for us, we need to be able to classify the behavior of the compilers and hardware the tests are running on. To do this, we can group test results into equivalence classes. In this case, an equivalence class will simply be a group of test results that have the same score.

About the organization of the tests

Here are the components of the tests:

  • The actual test suite, that consists of 8 different tests:

    • DoHariGSBasic -- an implementation of the Gram-Schmidt algorithm
    • DoHariGSImproved -- a less naive implementation of Gram-Schmidt
    • DoMatrixMultSanity -- a simple arbitrary matrix multiplied with an identity
    • DoOrthoPerturbTest -- this test, beginning with two known orthogonal vectors, incrementally 'fuzzes' one element of vector 'A' (at a time), by one ULP, and checks for orthgonality (score is dot product of the fuzzed 'A' and pristine 'B'). Also, there are two variants of this test: the initial vectors are randomly initialized with whole powers of 2 (i.e. 1, 2, 4, ...), or they are initialized with this lambda: '&indexer{return 0.2 / pow((T)10.0, indexer++);}', where this is called as the value for each vector element is needed (i.e. 0.2/1, 0.2/10, 0.2/100, etc). This gives a nice sequence of non-representable reals by powers of 2 (binary)
    • DoSimpleRotate90 -- this test rotates [1 1 1] PI/2 radians and checks the L1 and LInf distances from expected
    • DoSkewSymCPRotationTest -- takes two random 3d vectors and attempts to align them using 'skew symmetric cross product' rotation -- the scores are the L1 and LInf distances from the product and the expected
    • RotateAndUnrotate -- this tests takes a random 3d vector, rotates PI radians, rotates again -PI radians, and generates scores from L1 and LInf distance from starting vector
    • RotateFullCircle -- this test interatively rotates a random 3d vector by 2*Pi/n, n rotations and scores by the L1 and LInf distances from the starting vector

Identifying equivalence classes

An obvious starting point for identifying discrepancies in floating point results in the test set is to count the equivalence classes (or the number of different scores per test). We can start by looking at this list, and from there we'll choose a test to dig into further.

We made the decision that the first thing we'd explore would be the differences between compilers, so we'll begin by looking at the variability on a single host. For this, we'll choose the host 'kingspeak1':

qfp=# select count(distinct score0), name from tests where host = 
      'kingspeak1' and run = (select max(index) from runs) group by 
       name order by name;

count |          name
-------+-------------------------
    5 | DoHariGSBasic
    3 | DoHariGSImproved
    1 | DoMatrixMultSanity
   16 | DoOrthoPerturbTest
    3 | DoSimpleRotate90
   21 | DoSkewSymCPRotationTest
    8 | RotateAndUnrotate
   20 | RotateFullCircle
 (8 rows)

We're seeing the most variability in 'RotateFullCircle' (before I corrected an error in my reduction routine, which was truncating floating point vectors to integers! -- this is a long story, but it comes down to that ''std::inner_product'' was inferring the type from the second parameter, which is the initial value -- anyway, was 'DoOrthoPerturbTest'). However, we're going to stick with 'DoOrthoPerturbTest' because it doesn't have the complication of relying on transcendental functions, so it should be fairly simple to see where the difference is occuring in the compiler generated code.

At this point, we need to introduce the concept of ''characteristic operations'. These are the critical parts of the calculations that are performed in each test, and where differences in the code that is generated by the compiler / assembler and executed by the processor / math co-processor / vector unit is going to produce different results.

Really, the two key characteristic operations for this test suite are the scoring functions:

  • L1Distance
T
L1Distance(Vector<T> const &rhs) const {
  T distance = 0;
  for(int x = 0; x < size(); ++x){
    distance += fabs(data[x] - rhs.data[x]);
  }
  return distance;
}
  • Dot Product (^ operator) (and the helper function, 'reduce')
  T
operator^(Vector<T> const &rhs) const {
  T sum = 0;
  if( sortType == bi){
    sum = std::inner_product(data.begin(), data.end(), rhs.data.begin(), (T)0.0);
  }else{
    vector<T> prods(data.size());
    for(int i = 0; i < size(); ++i){ 
      prods[i] = data[i] * rhs.data[i]; 
    }
    reduce(prods, [&sum](T e){sum += e;});
  }
  return sum;
}

template<typename F, class C>
void
reduce(C &cont, F const &fun) const {
  T retVal;
  if(sortType == lt || sortType == gt){
    if(sortType == lt)
      std::sort(cont.begin(), cont.end(),
                [](T a, T b){return fabs(a) < fabs(b);});
    else
      std::sort(cont.begin(), cont.end(),
                [](T a, T b){return fabs(a) > fabs(b);});
  }
  for_each(cont.begin(), cont.end(), fun);
}

Now we'll explore the test set . . .

DoOrthoPerturbTest

[the heart of the test]

    T backup;
    for(int r = 0; r < dim; ++r){
    T &p = a[r];
      backup = p;
      for(int i = 0; i < iters; ++i){
	p = FPHelpers::perturbFP(backup, i * ulp_inc);
	bool isOrth = a.isOrtho(b);
	if(isOrth){
	  orthoCount[r]++;
	  if(i != 0) score += fabs(p - backup);
	  //score should be perturbed amount
	}else{
	  if(i == 0) score += fabs(a ^ b);
	  //if falsely not detecting ortho, should be the dot prod
	}
	info_stream << "a[" << r << "] = " << a[r] << " perp: "
	<< isOrth << " dot prod: " << FPWrap<T>(a ^ b) << endl;
      }
      info_stream << "next dimension . . . " << endl;
      p = backup;
    }

This snippet is the main high level calculation for this test. But the characteristic operation here is dot product, which is encapsulated by 'Vector::isOrtho', and also represented here by it's operator, '^'.

BTW, 'score0' is the floating point hexadecimal representation of the first test score, which is either L1 distance from the correct answer or the dot product (where the test is comparing orthogonality). Also, there is a 'score0d' which is the result of the score0 to decimal (as performed by std::cout). Anyway, this means that a perfect score is 0.

Also, a note on the analysis of the compiler output: the variability in this test should primarily be caused by the dot product that is used after each vector 'fuzzing' operation. We'll focus on that when analyzing the disassembly of this test.

qfp=#  select count(score0) as c, score0, trunc(score0d, 25) as d
       from tests where name = 'DoOrthoPerturbTest' and host =
       'kingspeak1' and run = 25 group
       by score0, d order by d;
       
c  |        score0        |              d
----+----------------------+------------------------------
60 | 3fce8018000000000000 |  0.0000000000000017776578820
30 | 3fd98018000000000000 |  0.0000000000036406433423508
89 | 3fdb845a4c0000000000 |  0.0000000000150467624471239
31 | 3fdb846b604000000000 |  0.0000000000150543470918418
60 | 3fdb8476db4000000000 |  0.0000000000150594454441377
30 | 3fe3dfa6ba0000000000 |  0.0000000065091088075064362
30 | 3fe5a464468000000000 |  0.0000000191377127478631336
30 | 3fe5b64a768000000000 |  0.0000000212214503747532035
30 | 3fe5cede868000000000 |  0.0000000240827491282402661
60 | 3fe6845a4c0000000000 |  0.0000000308157694917099433
30 | 3fe68476db4000000000 |  0.0000000308417442695940735
59 | 3ff68018000000000000 |  0.0019545555114746093750000
 1 | 4001a9949d0000000000 |  5.2993912696838378906250000
31 | 40028045e68000000000 |  8.0170655250549316406250000
89 | 4003845a4c0000000000 | 16.5440902709960937500000000
60 | 40038476db4000000000 | 16.5580353736877441406250000
(16 rows)

So we have 16 different scores. The goal is to characterize the different results that are obtained by the different compilers and flags. It is interesting that accuracy really drops off at c=1 (an absolute error of ~5.3 vs best score of 1.777 x 10^-15).

Let's go through and attempt to characterize these results, by first examining which parameters are set for each one (which consist of precision, sort, compiler and compiler flag). We'll go through the list, from most accurate to least accurate.

3fce8018000000000000
qfp=# select compiler, switches, sort, precision from tests where
      score0='3fce8018000000000000' and host = 'kingspeak1' and name
      = 'DoOrthoPerturbTest' and run = (select max(index) from runs)
      order by compiler, switches;

        compiler         |          switches           | sort | precision 
-------------------------+-----------------------------+------+-----------
 "g++"                   |                             | gt   | e
 "g++"                   |                             | gt   | e
 "g++"                   | -fassociative-math          | gt   | e
 "g++"                   | -fcx-fortran-rules          | gt   | e
 "g++"                   | -fcx-limited-range          | gt   | e
 "g++"                   | -fexcess-precision=fast     | gt   | e
 "g++"                   | -ffast-math                 | gt   | e
 "g++"                   | -ffinite-math-only          | gt   | e
 "g++"                   | -ffloat-store               | gt   | e
 "g++"                   | -ffp-contract=on            | gt   | e
 "g++"                   | -fmerge-all-constants       | gt   | e
 "g++"                   | -fno-trapping-math          | gt   | e
 "g++"                   | -freciprocal-math           | gt   | e
 "g++"                   | -frounding-math             | gt   | e
 "g++"                   | -fsignaling-nans            | gt   | e
 "g++"                   | -funsafe-math-optimizations | gt   | e
 "g++"                   | -mavx                       | gt   | e
 "g++"                   | -mfpmath=sse -mtune=native  | gt   | e
 "g++"                   | -O0                         | gt   | e
 "g++"                   | -O1                         | gt   | e
 "g++"                   | -O2                         | gt   | e
 "g++"                   | -O3                         | gt   | e
 "icpc -mlong-double-80" |                             | gt   | e
 "icpc -mlong-double-80" |                             | gt   | e
 "icpc -mlong-double-80" | -fassociative-math          | gt   | e
 "icpc -mlong-double-80" | -fcx-fortran-rules          | gt   | e
 "icpc -mlong-double-80" | -fcx-limited-range          | gt   | e
 "icpc -mlong-double-80" | -fexcess-precision=fast     | gt   | e
 "icpc -mlong-double-80" | -fexcess-precision=standard | gt   | e
 "icpc -mlong-double-80" | -ffast-math                 | gt   | e
 "icpc -mlong-double-80" | -ffinite-math-only          | gt   | e
 "icpc -mlong-double-80" | -ffloat-store               | gt   | e
 "icpc -mlong-double-80" | -ffp-contract=on            | gt   | e
 "icpc -mlong-double-80" | -fma                        | gt   | e
 "icpc -mlong-double-80" | -fmerge-all-constants       | gt   | e
 "icpc -mlong-double-80" | -fno-trapping-math          | gt   | e
 "icpc -mlong-double-80" | -fp-model=extended          | gt   | e
 "icpc -mlong-double-80" | -fp-model=precise           | gt   | e
 "icpc -mlong-double-80" | -fp-model=strict            | gt   | e
 "icpc -mlong-double-80" | -fp-port                    | gt   | e
 "icpc -mlong-double-80" | -fp-trap=common             | gt   | e
 "icpc -mlong-double-80" | -freciprocal-math           | gt   | e
 "icpc -mlong-double-80" | -frounding-math             | gt   | e
 "icpc -mlong-double-80" | -fsignaling-nans            | gt   | e
 "icpc -mlong-double-80" | -ftz                        | gt   | e
 "icpc -mlong-double-80" | -funsafe-math-optimizations | gt   | e
 "icpc -mlong-double-80" | -mavx                       | gt   | e
 "icpc -mlong-double-80" | -mfpmath=sse -mtune=native  | gt   | e
 "icpc -mlong-double-80" | -mp1                        | gt   | e
 "icpc -mlong-double-80" | -no-fma                     | gt   | e
 "icpc -mlong-double-80" | -no-ftz                     | gt   | e
 "icpc -mlong-double-80" | -no-prec-div                | gt   | e
 "icpc -mlong-double-80" | -O0                         | gt   | e
 "icpc -mlong-double-80" | -O1                         | gt   | e
 "icpc -mlong-double-80" | -O2                         | gt   | e
 "icpc -mlong-double-80" | -O3                         | gt   | e
 "icpc -mlong-double-80" | -prec-div                   | gt   | e
(57 rows)

So it's no surprise that the most accurate equivalence class is all extended (long double, the most accurate) precision, and that the reduction sort is 'gt' (such that the smaller values are accumulated first, avoiding some 'loss of significance' [or loss of information when adding or subtracting values with different magnitudes]). This would affect the test of orthogonality.

It turns out that the the flags had no effect on this test with 'gt' sort and 'e' precision, as confirmed by this query:

qfp=# select count(*)  from tests where sort='gt' and
      precision = 'e' and host = 'kingspeak1' and name
      = 'DoOrthoPerturbTest' and run = (select max(index)
      from runs);
      
 count 
-------
    57

This shows that compiler and switches didn't affect the results for this eq class.

  • Let's take a quick look at the chacteristic operation for this test, which is the reduce function, called by dot product (^ operator):
//fun is a lambda passed into
//reduce, in this case is:
//long double sum; [&sum](long double e){sum += e;}
std::sort(cont.begin(), cont.end(),
	  [](T a, T b){return fabs(a) > fabs(b);});
for_each(cont.begin(), cont.end(), fun);

** Intel

  402e20:	55                   	push   %rbp
  402e21:	48 89 e5             	mov    %rsp,%rbp
  402e24:	48 83 ec 40          	sub    $0x40,%rsp
  402e28:	48 89 7d d0          	mov    %rdi,-0x30(%rbp)
  402e2c:	db 6d 10             	fldt   0x10(%rbp)
  402e2f:	db 3c 24             	fstpt  (%rsp)
  402e32:	e8 71 dc 02 00       	callq  430aa8 <std::fabs(long double)>
  402e37:	db 7d e0             	fstpt  -0x20(%rbp)
  402e3a:	48 83 c4 10          	add    $0x10,%rsp
  402e3e:	48 83 c4 f0          	add    $0xfffffffffffffff0,%rsp
  402e42:	db 6d 20             	fldt   0x20(%rbp)
  402e45:	db 3c 24             	fstpt  (%rsp)
  402e48:	e8 5b dc 02 00       	callq  430aa8 <std::fabs(long double)>
  402e4d:	db 7d f0             	fstpt  -0x10(%rbp)
  402e50:	48 83 c4 10          	add    $0x10,%rsp
  402e54:	db 6d e0             	fldt   -0x20(%rbp)
  402e57:	db 6d f0             	fldt   -0x10(%rbp)
  402e5a:	b8 01 00 00 00       	mov    $0x1,%eax
  402e5f:	ba 00 00 00 00       	mov    $0x0,%edx
  402e64:	d9 c9                	fxch   %st(1)
  402e66:	df f1                	fcomip %st(1),%st
  402e68:	dd d8                	fstp   %st(0)
  402e6a:	0f 47 d0             	cmova  %eax,%edx
  402e6d:	89 d0                	mov    %edx,%eax
  402e6f:	c9                   	leaveq 
  402e70:	c3                   	retq   

Intel is using the x87 fp unit instructions

** GCC (O0)

  41e294:	55                   	push   %rbp
  41e295:	48 89 e5             	mov    %rsp,%rbp
  41e298:	48 83 ec 30          	sub    $0x30,%rsp
  41e29c:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
  41e2a0:	48 8b 45 10          	mov    0x10(%rbp),%rax
  41e2a4:	8b 55 18             	mov    0x18(%rbp),%edx
  41e2a7:	48 89 04 24          	mov    %rax,(%rsp)
  41e2ab:	89 54 24 08          	mov    %edx,0x8(%rsp)
  41e2af:	e8 e4 70 fe ff       	callq  405398 <std::fabs(long double)>
  41e2b4:	db 7d e0             	fstpt  -0x20(%rbp)
  41e2b7:	48 8b 45 20          	mov    0x20(%rbp),%rax
  41e2bb:	8b 55 28             	mov    0x28(%rbp),%edx
  41e2be:	48 89 04 24          	mov    %rax,(%rsp)
  41e2c2:	89 54 24 08          	mov    %edx,0x8(%rsp)
  41e2c6:	e8 cd 70 fe ff       	callq  405398 <std::fabs(long double)>
  41e2cb:	db 6d e0             	fldt   -0x20(%rbp)
  41e2ce:	d9 c9                	fxch   %st(1)
  41e2d0:	df e9                	fucomip %st(1),%st
  41e2d2:	dd d8                	fstp   %st(0)
  41e2d4:	0f 97 c0             	seta   %al
  41e2d7:	c9                   	leaveq 
  41e2d8:	c3                   	retq   

The takeaway here is that GCC is using x87 fp unit instructions (such as the %st registers)

3fd98018000000000000

qfp=# select compiler, switches, sort, precision from tests where score0='3fd98018000000000000'
      and host = 'kingspeak1' and name = 'DoOrthoPerturbTest' and run = 19 order by compiler,
      switches;
      
        compiler         |          switches           | sort | precision 
-------------------------+-----------------------------+------+-----------
 "g++"                   |                             | gt   | d
 "g++"                   |                             | gt   | d
 "g++"                   | -fassociative-math          | gt   | d
 "g++"                   | -fcx-fortran-rules          | gt   | d
 "g++"                   | -fcx-limited-range          | gt   | d
 "g++"                   | -fexcess-precision=fast     | gt   | d
 "g++"                   | -ffast-math                 | gt   | d
 "g++"                   | -ffinite-math-only          | gt   | d
 "g++"                   | -ffloat-store               | gt   | d
 "g++"                   | -ffp-contract=on            | gt   | d
 "g++"                   | -fmerge-all-constants       | gt   | d
 "g++"                   | -fno-trapping-math          | gt   | d
 "g++"                   | -freciprocal-math           | gt   | d
 "g++"                   | -frounding-math             | gt   | d
 "g++"                   | -fsignaling-nans            | gt   | d
 "g++"                   | -funsafe-math-optimizations | gt   | d
 "g++"                   | -mavx                       | gt   | d
 "g++"                   | -mfpmath=sse -mtune=native  | gt   | d
 "g++"                   | -O0                         | gt   | d
 "g++"                   | -O1                         | gt   | d
 "g++"                   | -O2                         | gt   | d
 "g++"                   | -O3                         | gt   | d
 "icpc -mlong-double-80" | -fp-model=extended          | gt   | d
 "icpc -mlong-double-80" | -fp-model=precise           | gt   | d
 "icpc -mlong-double-80" | -fp-model=strict            | gt   | d
 "icpc -mlong-double-80" | -frounding-math             | gt   | d
 "icpc -mlong-double-80" | -mavx                       | gt   | d
 "icpc -mlong-double-80" | -O0                         | gt   | d
 "icpc -mlong-double-80" | -O1                         | gt   | d
(29 rows)

3fdb845a4c0000000000

qfp=# select compiler, switches, sort, precision from tests where score0='3fdb845a4c0000000000' and host = 'kingspeak1' and name = 'DoOrthoPerturbTest' and run = 19 order by compiler, switches;

       compiler         |          switches           | sort | precision 
-------------------------+-----------------------------+------+-----------
 "g++"                   |                             | bi   | e
 "g++"                   |                             | us   | e
 "g++"                   |                             | bi   | e
 "g++"                   |                             | us   | e
 "g++"                   | -fassociative-math          | us   | e
 "g++"                   | -fassociative-math          | bi   | e
 "g++"                   | -fcx-fortran-rules          | bi   | e
 "g++"                   | -fcx-fortran-rules          | us   | e
 "g++"                   | -fcx-limited-range          | us   | e
 "g++"                   | -fcx-limited-range          | bi   | e
 "g++"                   | -fexcess-precision=fast     | us   | e
 "g++"                   | -fexcess-precision=fast     | bi   | e
 "g++"                   | -ffast-math                 | bi   | e
 "g++"                   | -ffast-math                 | us   | e
 "g++"                   | -ffinite-math-only          | bi   | e
 "g++"                   | -ffinite-math-only          | us   | e
 "g++"                   | -ffloat-store               | bi   | e
 "g++"                   | -ffloat-store               | us   | e
 "g++"                   | -ffp-contract=on            | us   | e
 "g++"                   | -ffp-contract=on            | bi   | e
 "g++"                   | -fmerge-all-constants       | bi   | e
 "g++"                   | -fmerge-all-constants       | us   | e
 "g++"                   | -fno-trapping-math          | bi   | e
 "g++"                   | -fno-trapping-math          | us   | e
 "g++"                   | -freciprocal-math           | us   | e
 "g++"                   | -freciprocal-math           | bi   | e
 "g++"                   | -frounding-math             | us   | e
 "g++"                   | -frounding-math             | bi   | e
 "g++"                   | -fsignaling-nans            | bi   | e
 "g++"                   | -fsignaling-nans            | us   | e
 "g++"                   | -funsafe-math-optimizations | bi   | e
 "g++"                   | -funsafe-math-optimizations | us   | e
 "g++"                   | -mavx                       | us   | e
 "g++"                   | -mavx                       | bi   | e
 "g++"                   | -mfpmath=sse -mtune=native  | us   | e
 "g++"                   | -mfpmath=sse -mtune=native  | bi   | e
 "g++"                   | -O0                         | us   | e
 "g++"                   | -O0                         | bi   | e
 "g++"                   | -O1                         | bi   | e
 "g++"                   | -O1                         | us   | e
 "g++"                   | -O2                         | bi   | e
 "g++"                   | -O2                         | us   | e
 "g++"                   | -O3                         | us   | e
 "g++"                   | -O3                         | bi   | e
 "icpc -mlong-double-80" |                             | us   | e
 "icpc -mlong-double-80" |                             | us   | e
 "icpc -mlong-double-80" | -fassociative-math          | us   | e
 "icpc -mlong-double-80" | -fcx-fortran-rules          | us   | e
 "icpc -mlong-double-80" | -fcx-limited-range          | us   | e
 "icpc -mlong-double-80" | -fexcess-precision=fast     | us   | e
 "icpc -mlong-double-80" | -fexcess-precision=standard | us   | e
 "icpc -mlong-double-80" | -ffast-math                 | us   | e
 "icpc -mlong-double-80" | -ffinite-math-only          | us   | e
 "icpc -mlong-double-80" | -ffloat-store               | us   | e
 "icpc -mlong-double-80" | -ffp-contract=on            | us   | e
 "icpc -mlong-double-80" | -fma                        | us   | e
 "icpc -mlong-double-80" | -fmerge-all-constants       | us   | e
 "icpc -mlong-double-80" | -fno-trapping-math          | us   | e
 "icpc -mlong-double-80" | -fp-model=extended          | bi   | e
 "icpc -mlong-double-80" | -fp-model=extended          | us   | e
 "icpc -mlong-double-80" | -fp-model=precise           | bi   | e
 "icpc -mlong-double-80" | -fp-model=precise           | us   | e
 "icpc -mlong-double-80" | -fp-model=strict            | us   | e
 "icpc -mlong-double-80" | -fp-model=strict            | bi   | e
 "icpc -mlong-double-80" | -fp-port                    | us   | e
 "icpc -mlong-double-80" | -fp-trap=common             | us   | e
 "icpc -mlong-double-80" | -freciprocal-math           | us   | e
 "icpc -mlong-double-80" | -frounding-math             | us   | e
 "icpc -mlong-double-80" | -frounding-math             | bi   | e
 "icpc -mlong-double-80" | -fsignaling-nans            | us   | e
 "icpc -mlong-double-80" | -ftz                        | us   | e
 "icpc -mlong-double-80" | -funsafe-math-optimizations | us   | e
 "icpc -mlong-double-80" | -mavx                       | us   | e
 "icpc -mlong-double-80" | -mfpmath=sse -mtune=native  | us   | e
 "icpc -mlong-double-80" | -mp1                        | us   | e
 "icpc -mlong-double-80" | -no-fma                     | us   | e
 "icpc -mlong-double-80" | -no-ftz                     | us   | e
 "icpc -mlong-double-80" | -no-prec-div                | us   | e
 "icpc -mlong-double-80" | -O0                         | bi   | e
 "icpc -mlong-double-80" | -O0                         | us   | e
 "icpc -mlong-double-80" | -O1                         | us   | e
 "icpc -mlong-double-80" | -O1                         | bi   | e
 "icpc -mlong-double-80" | -O2                         | us   | e
 "icpc -mlong-double-80" | -O3                         | us   | e
 "icpc -mlong-double-80" | -prec-div                   | us   | e
(85 rows)

Looking at the space of gt + d

qfp=# select trunc(score0d, 25) as s, score0, compiler, switches, sort, precision
      from tests where sort='gt' and precision='d' and host = 'kingspeak1' and name
      = 'DoOrthoPerturbTest' and run = 25 order by s, compiler, switches;

              s              |        score0        |       compiler        |          switches           | sort | precision
-----------------------------+----------------------+-----------------------+-----------------------------+------+-----------
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   |                             | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -fassociative-math          | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -fcx-fortran-rules          | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -fcx-limited-range          | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -fexcess-precision=fast     | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -ffast-math                 | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -ffinite-math-only          | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -ffloat-store               | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -ffp-contract=on            | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -fmerge-all-constants       | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -fno-trapping-math          | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -freciprocal-math           | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -frounding-math             | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -fsignaling-nans            | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -funsafe-math-optimizations | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -mavx                       | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -mfpmath=sse -mtune=native  | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -O0                         | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -O1                         | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -O2                         | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | g++                   | -O3                         | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=double            | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=extended          | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=precise           | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=source            | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -fp-model=strict            | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -frounding-math             | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -mavx                       | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -O0                         | gt   | d
0.0000000000036406433423508 | 3fd98018000000000000 | icpc -mlong-double-80 | -O1                         | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 |                             | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fassociative-math          | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fcx-fortran-rules          | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fcx-limited-range          | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fexcess-precision=fast     | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fexcess-precision=standard | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ffast-math                 | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ffinite-math-only          | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ffloat-store               | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ffp-contract=on            | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fma                        | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fmerge-all-constants       | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fno-trapping-math          | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fp-model fast=1            | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fp-model fast=2            | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fp-port                    | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fp-trap=common             | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -freciprocal-math           | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fsignaling-nans            | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -fsingle-precision-constant | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -ftz                        | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -funsafe-math-optimizations | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -mfpmath=sse -mtune=native  | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -mp1                        | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -no-fma                     | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -no-ftz                     | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -no-prec-div                | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -O2                         | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -O3                         | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 | -prec-div                   | gt   | d
(60 rows)


This is an interesting example, where we can find two different compilers on the same host, using the same reduction sort and the same precision giving different results (and both compilers, Intel and gcc, using their default options, no flags).

We will begin to explore this by going into the detailed output of the compiled tests, which will help us identify where we are getting a different sequence of assembled machine instructions that produce the differences.

We'll explore why

0.0000000000036406433423508 | 3fd98018000000000000 | g++                   |                             | gt   | d
0.0000000212214503747532035 | 3fe5b64a768000000000 | icpc -mlong-double-80 |                             | gt   | d