-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binned_dmdmadd function is buggy #3
Comments
I've got this working. The patch is as follows (essentially "X" and "Y" got mixed up): diff --git a/src/reproducible_floating_accumulator.hpp b/src/reproducible_floating_accumulator.hpp
index d1c89b9..0e712d6 100644
--- a/src/reproducible_floating_accumulator.hpp
+++ b/src/reproducible_floating_accumulator.hpp
@@ -420,10 +420,10 @@ class ReproducibleFloatingAccumulator {
///@param incpriY stride within Y's primary vector (use every incpriY'th element)
///@param inccarY stride within Y's carry vector (use every inccarY'th element)
__host__ __device__ void binned_dmdmadd(const ReproducibleFloatingAccumulator &other, const int incpriX, const int inccarX, const int incpriY, const int inccarY) {
- auto *const priX = pvec();
- auto *const carX = cvec();
- auto *const priY = other.pvec();
- auto *const carY = other.cvec();
+ auto *const priX = other.pvec();
+ auto *const carX = other.cvec();
+ auto *const priY = pvec();
+ auto *const carY = cvec();
if (priX[0] == 0.0)
return;
@@ -441,8 +441,8 @@ class ReproducibleFloatingAccumulator {
return;
}
- const auto X_index = binned_index(priX);
- const auto Y_index = binned_index(priY);
+ const auto X_index = other.binned_index();
+ const auto Y_index = binned_index();
const auto shift = Y_index - X_index;
if(shift > 0){
const auto *const bins = binned_bins(Y_index);
@@ Apologies if my line numbers are a bit off from yours, as I've made a few local changes. |
Hey @maddyscientist I have tried using the RFA to just add a single floating point number (after initializing it to zero using the provided method) like say 10.0 but the result is 12. Is this intended? If so could you please point me to an explanation of what is happening with this simple addition operation on an rfa type (I assume that the intended use case is reduction of multiple values in a vector). P.S. I don't understand the algorithm in detail. Thank you! CC @gevtushenko |
Having finally gotten around to hacking on this code, to port it to CUDA, while I've got an initial single thread version that passes the tests ok, in moving to a multi-thread reduction, I've found that the
binned_dmdmadd
function, used to sum two instances ofReproducibleFloatingAccumulator
is buggy.Specifically, when instantiated, it doesn't compile because:
other
reference, which isconst
qualified, e.g.The text was updated successfully, but these errors were encountered: