Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binned_dmdmadd function is buggy #3

Open
maddyscientist opened this issue Oct 11, 2022 · 2 comments
Open

binned_dmdmadd function is buggy #3

maddyscientist opened this issue Oct 11, 2022 · 2 comments

Comments

@maddyscientist
Copy link

Having finally gotten around to hacking on this code, to port it to CUDA, while I've got an initial single thread version that passes the tests ok, in moving to a multi-thread reduction, I've found that the binned_dmdmadd function, used to sum two instances of ReproducibleFloatingAccumulator is buggy.

Specifically, when instantiated, it doesn't compile because:

  • The function attempts to modify the other reference, which is const qualified, e.g.
priY[i*incpriY] = priX[i*incpriX];
  • These lines are erroneous, since they attempt to pass an argument to a function that expects none
const auto X_index = binned_index(priX);
const auto Y_index = binned_index(priY);
@maddyscientist
Copy link
Author

I've got this working. The patch is as follows (essentially "X" and "Y" got mixed up):

diff --git a/src/reproducible_floating_accumulator.hpp b/src/reproducible_floating_accumulator.hpp
index d1c89b9..0e712d6 100644
--- a/src/reproducible_floating_accumulator.hpp
+++ b/src/reproducible_floating_accumulator.hpp
@@ -420,10 +420,10 @@ class ReproducibleFloatingAccumulator {
   ///@param incpriY stride within Y's primary vector (use every incpriY'th element)
   ///@param inccarY stride within Y's carry vector (use every inccarY'th element)
   __host__ __device__ void binned_dmdmadd(const ReproducibleFloatingAccumulator &other, const int incpriX, const int inccarX, const int incpriY, const int inccarY) {
-    auto *const priX = pvec();
-    auto *const carX = cvec();
-    auto *const priY = other.pvec();
-    auto *const carY = other.cvec();
+    auto *const priX = other.pvec();
+    auto *const carX = other.cvec();
+    auto *const priY = pvec();
+    auto *const carY = cvec();
 
     if (priX[0] == 0.0)
       return;
@@ -441,8 +441,8 @@ class ReproducibleFloatingAccumulator {
       return;
     }
 
-    const auto X_index = binned_index(priX);
-    const auto Y_index = binned_index(priY);
+    const auto X_index = other.binned_index();
+    const auto Y_index = binned_index();
     const auto shift = Y_index - X_index;
     if(shift > 0){
       const auto *const bins = binned_bins(Y_index);
@@

Apologies if my line numbers are a bit off from yours, as I've made a few local changes.

@SAtacker
Copy link

SAtacker commented Jun 2, 2024

Hey @maddyscientist I have tried using the RFA to just add a single floating point number (after initializing it to zero using the provided method) like say 10.0 but the result is 12.
While debugging this, I found that it goes into the binned_dmrenorm (after update and deposit in that order) and somewhere inbetween the bitwise manipulations to set the bits in primary vector we make it 12.

Is this intended? If so could you please point me to an explanation of what is happening with this simple addition operation on an rfa type (I assume that the intended use case is reduction of multiple values in a vector).

P.S. I don't understand the algorithm in detail.

Thank you!

CC @gevtushenko

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants