Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use spectral decomposition as more accurate fallback from Cholesky #2930
Use spectral decomposition as more accurate fallback from Cholesky #2930
Changes from 4 commits
38bab8b
9deb825
53dd077
12cbd13
eec237e
c8fadb7
b062696
c2deb3c
97f6f8a
7d49a14
c859340
119edcb
9a97161
81fca31
74dce1c
4c1271e
d21da3c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if symmetric matrix decomposition fail to converge. For example, due to some rounding errors.
Isn't it required to fall back to general SVD decomposition (like
gesvd
which cannot fail)?Or maybe it would be even more stable approach to use
gesvd
instead ofsyev
for a spectral decomposition from the beginning?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,$\mathbf{X}$ would be more accurate than $\mathbf{X}^T \mathbf{X}$ , but it would be slower. However the solver $\mathbf{X}^T \mathbf{X}$ (I guess that's what the name implies but not 100% sure).
gesvd
onsyev
onnormEq
is based ongesvd
could potentially be a fallback from QR, but that'd be a different PR.I think
scikit-learn-intelex
only callsoneDAL
's solver withnormEq
BTW.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant not to apply$X$ , but rather to $X^T X$ .$X^T X$ might be not positive-semidefinite and
$X^T X \beta = X^T y$
$X^T X = U \Sigma V^T$
$\beta = V \Sigma^{-1} U^T X^T y$
gesvd
toDue to the factors like rounding errors
syev
might fail to converge. And we will come to the same issue as we have now with Cholesky.With SVD we can do:
And there will should be no possibilities to fail.
Or maybe this is too precautious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems that$X$ matrix -> double cost in terms of performance.
sklearnex
only uses normal equations method now. We have QR method implemented, but on CPUs only.And it is rather inefficient in terms of performance to use normal equations at first, and if fail then apply QR to the whole
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow.
The Eigendecomposition part doesn't require positiveness - it would succeed even if the input had negative entries in the main diagonal. If by numerical inaccuracies it happens that some eigenvalue is close to zero in theory but has negative sign when calculated in practice (negative-indefinite matrices would have negative eigenvalues), it would anyway be removed as part of the procedure, so the end result would not take that component.
These "sy" routines takes the values only from the lower triangle of the input matrix, so there shouldn't be any issues with symmetry either if the matrix is getting filled in both corners. I understand they should also be more accurate than
gesvd
when the input matrix is actually symmetric, but I'm not 100% sure about it.About the QR part: sorry, to clarify - what I meant was that$\mathbf{X}$ could be a fallback when $\mathbf{X}$ fails. In those cases, $\mathbf{X}^T \mathbf{X}$ would not get computed.
gesvd
ongeqrf
onThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right. Forgot that positiveness is not a problem for eigenvalue decomposition, only for Cholesky.
Sorry for misleading info.
My comment was based on the experience with eigenvalue decomposition methods. But I forgot what exactly led to failures. So, I mean those methods can fail and trying to avoid that.
Probably it was not positiveness, but numerical stability issues.$X^T X$ is too small, i.e. all the eigenvalues are close, some eigenvalue decomposition methods can fail. Similar things might happen when the condition number is too big.
If the condition number of
But I am not too familiar in terms of which MKL decomposition routine is better in the case of small or big condition numbers [thought
gesvd
is the best, but it looks like it's not].Anyway from your comment I see that for symmetric matrices
sy
methods should be better thange
. So, the choice ofsyevr
looks reasonable now.I also understand the point about QR and$X$ decomposition, but this is not the scope of this PR.
gesvd
forThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. It looks from the syevd docs that it is indeed possible for this routine to fail due to values in the inputs, although it doesn't explain when that could happen.
That being said, in case it makes it look less bad, if Eigendecomposition fails, it follows that other methods like SciPy's
lstsq
or scikit-learn'sLinearRegression
would also fail since they are also based on spectral decomposition.