-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Total variance explained > 1 #7
Comments
Correct me if this is wrong, but is the above because |
This is what i see in the README.md
It let's me think, that var_exp should be the variance explained. |
Not really. According to the code, the reason is an inconsistent use of degrees of freedom in the calculation of covariance (as in the numerator) and variance (as in the denominator). When calculating variance explained by each PC, Line 98 in 7eef1c0
However, when calculating the total variance, Lines 127 to 129 in 7eef1c0
As a result, the denominator is slightly smaller or the numerator is slightly larger, which causes the >1 explained variance ratio. A simple fix is to multiply |
Forgive me if this is a known counterintuitive point deemed irrelevant, but I noticed total variance explained by all components is greater than one. That's true in my dataset with missing values, but also in the complete example below.
It seems to be related to the fact that the sum of all eigenvalues is greater than the number of dimensions in the original dataset. Since sum of eigenvalues should be equal to trace of correlation matrix, I would not expect that to be the case.
The text was updated successfully, but these errors were encountered: