binned_dmdmadd function is buggy #3

maddyscientist · 2022-10-11T03:11:06Z

Having finally gotten around to hacking on this code, to port it to CUDA, while I've got an initial single thread version that passes the tests ok, in moving to a multi-thread reduction, I've found that the binned_dmdmadd function, used to sum two instances of ReproducibleFloatingAccumulator is buggy.

Specifically, when instantiated, it doesn't compile because:

The function attempts to modify the other reference, which is const qualified, e.g.

priY[i*incpriY] = priX[i*incpriX];

These lines are erroneous, since they attempt to pass an argument to a function that expects none

const auto X_index = binned_index(priX);
const auto Y_index = binned_index(priY);

The text was updated successfully, but these errors were encountered:

maddyscientist · 2022-10-11T06:21:48Z

I've got this working. The patch is as follows (essentially "X" and "Y" got mixed up):

diff --git a/src/reproducible_floating_accumulator.hpp b/src/reproducible_floating_accumulator.hpp
index d1c89b9..0e712d6 100644
--- a/src/reproducible_floating_accumulator.hpp
+++ b/src/reproducible_floating_accumulator.hpp
@@ -420,10 +420,10 @@ class ReproducibleFloatingAccumulator {
   ///@param incpriY stride within Y's primary vector (use every incpriY'th element)
   ///@param inccarY stride within Y's carry vector (use every inccarY'th element)
   __host__ __device__ void binned_dmdmadd(const ReproducibleFloatingAccumulator &other, const int incpriX, const int inccarX, const int incpriY, const int inccarY) {
-    auto *const priX = pvec();
-    auto *const carX = cvec();
-    auto *const priY = other.pvec();
-    auto *const carY = other.cvec();
+    auto *const priX = other.pvec();
+    auto *const carX = other.cvec();
+    auto *const priY = pvec();
+    auto *const carY = cvec();
 
     if (priX[0] == 0.0)
       return;
@@ -441,8 +441,8 @@ class ReproducibleFloatingAccumulator {
       return;
     }
 
-    const auto X_index = binned_index(priX);
-    const auto Y_index = binned_index(priY);
+    const auto X_index = other.binned_index();
+    const auto Y_index = binned_index();
     const auto shift = Y_index - X_index;
     if(shift > 0){
       const auto *const bins = binned_bins(Y_index);
@@

Apologies if my line numbers are a bit off from yours, as I've made a few local changes.

SAtacker · 2024-06-02T04:22:09Z

Hey @maddyscientist I have tried using the RFA to just add a single floating point number (after initializing it to zero using the provided method) like say 10.0 but the result is 12.
While debugging this, I found that it goes into the binned_dmrenorm (after update and deposit in that order) and somewhere inbetween the bitwise manipulations to set the bits in primary vector we make it 12.

Is this intended? If so could you please point me to an explanation of what is happening with this simple addition operation on an rfa type (I assume that the intended use case is reduction of multiple values in a vector).

P.S. I don't understand the algorithm in detail.

Thank you!

CC @gevtushenko

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binned_dmdmadd function is buggy #3

binned_dmdmadd function is buggy #3

maddyscientist commented Oct 11, 2022

maddyscientist commented Oct 11, 2022

SAtacker commented Jun 2, 2024 •

edited

Loading

binned_dmdmadd function is buggy #3

binned_dmdmadd function is buggy #3

Comments

maddyscientist commented Oct 11, 2022

maddyscientist commented Oct 11, 2022

SAtacker commented Jun 2, 2024 • edited Loading

SAtacker commented Jun 2, 2024 •

edited

Loading