-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt EigenDecomposition implementation #2038
base: main
Are you sure you want to change the base?
Conversation
Hi! A good start of the PR. That should fix your merge conflict. |
4ac147c
to
177921a
Compare
It should be fixed now |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2038 +/- ##
=======================================
Coverage ? 68.80%
Complexity ? 40731
=======================================
Files ? 1440
Lines ? 161743
Branches ? 31461
=======================================
Hits ? 111282
Misses ? 41388
Partials ? 9073 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks suspiciously close to the Commons Math implementation, with minor modifications. There are some key elements that make it faster, but I think we can refine it a bit.
src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java
Outdated
Show resolved
Hide resolved
// double xNormSqr = householderVectors.slice(k, k, k + 1, m - 1).sumSq(); | ||
// double xNormSqr = LibMatrixMult.dotProduct(hv, hv, k_kp1, k_kp1, m - (k + 1)); | ||
double xNormSqr = 0; | ||
for (int j = k + 1; j < m; ++j) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a square row sum,
this is an operation you can perform directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried these other approaches to try to get a performance boost but didn't change if not made it slower
double xNormSqr = householderVectors.slice(k, k, k + 1, m - 1).sumSq();
double xNormSqr = LibMatrixMult.dotProduct(hv, hv, k * m + k + 1, k * m + k + 1, m - (k + 1));
are you referring to other way to do it directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, then we do no change it.
householderVectors.sparseToDense(threads); | ||
} | ||
|
||
final double[] z = new double[m]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This temporary allocation seems like the only thing that limits you from parallelizing the entire for loop.
* @param threads The number of threads to use for computation. | ||
* @return An array of MatrixBlock objects containing the real eigen values and eigen vectors. | ||
*/ | ||
public static MatrixBlock[] computeEigenDecompositionSymm(MatrixBlock in, int threads) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would like to know where the time is spend inside this function call.
is it in transformToTridiagonal, getQ or findEigenVectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great this is something that is easy to fix, simply do not use the get and set method for the dense block, but directly on the underlying linearized array assign the cells.
This should immediately reduce the execution time by at least ~50% of that part.
double[] qaV = qaB.valuesAt(0); | ||
|
||
// build up first part of the matrix by applying Householder transforms | ||
for (int k = m - 1; k >= 1; --k) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why it should be done in opposite order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there is a reason. From page 263 of "Matrix Computations" by Gene H. Golub and Charles F. Van Loan: "Recall that the leading (j − 1)-by-(j − 1) portion of Qj is the identity. Thus, at the beginning of backward accumulation, Q is “mostly the identity” and it gradually becomes full as the iteration progresses. This pattern can be exploited to reduce the number of required flops. In contrast, Q is full in forward accumulation after the first step. For this reason, backward accumulation is cheaper and the strategy of choice."
src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java
Outdated
Show resolved
Hide resolved
Also feel free if you address any comments, to close or resolve the comments. |
just FYI, some tests are flaky when run in the cloud, therefore if the tests fail, and they are obviously not related to your changes no worries. |
Implements the Implicit QL Algorithm also used by the
EigenDecomposition
class using SystemDS data structures. Note this assumes the matrix is symmetric and fits in 1 block of a DenseBlock. The later is also assumed by the currentcomputeEigen
function so the only missed functionality is the support for non-symmetric matrices. The implementation focused on reducing as much allocations as possible to match and exceed the performance of the current eigen decomposition function across most cases.As an initial benchmark, a test was run with Symmetric Positive Definite and tridiagonal (symmetric) matrices of sizes 100x100, 500x500 and 1000x1000. The following table shows the test cases and amount of iterations done per test.
What follows are the results of the benchmark, showing an average improvement of 48.95 % across all cases with highest improvements in the largest matrices of 1000x1000 size, with up to a 83% improvement.