Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rules for cholesky and ldiv! on Cholesky #1307

Closed
wants to merge 77 commits into from

Conversation

simsurace
Copy link
Contributor

I had a suspicion that the existing rule was somehow off. I managed to break it with something like this:

function f(A)
    C = cholesky(A * A')
    return sum(abs2, C.L * C.U)
end
A = rand(3, 3)
test_reverse(f, Active, (A, Duplicated))

Working on a fix...

@simsurace
Copy link
Contributor Author

simsurace commented Feb 25, 2024

I went back to ChainRules.jl and copied the rule from there. As noted previously, the rules looked completely different before the change, with the Enzyme rule just accumulating the factors field of the cotangent, while the ChainRules rule did a sequence of linear algebra operations. Now with the change the new test passes, but some of the old tests are broken. @michel2323 could you please double check whether the tests you wrote are correct?

@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 130 lines in your changes missing coverage. Please review.

Project coverage is 65.76%. Comparing base (ff9d320) to head (b2e922a).
Report is 12 commits behind head on main.

Files Patch % Lines
src/internal_rules.jl 0.00% 130 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

❗ There is a different number of reports uploaded between BASE (ff9d320) and HEAD (b2e922a). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (ff9d320) HEAD (b2e922a)
2 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1307       +/-   ##
===========================================
- Coverage   96.30%   65.76%   -30.54%     
===========================================
  Files           8       31       +23     
  Lines         406    12383    +11977     
===========================================
+ Hits          391     8144     +7753     
- Misses         15     4239     +4224     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@simsurace
Copy link
Contributor Author

simsurace commented Feb 26, 2024

So the reverse rule for ldiv! for Cholesky was also wrong. I was able to fix that. Now only the forward rule needs to be fixed, I think that's why the test comparing forward and reverse results currently fail.
EDIT: I'm not sure as I don't fully understand the current tests. Maybe it's better to replace by EnzymeTestUtils tests?

@simsurace
Copy link
Contributor Author

@michel2323 I think the rules are now fixed. I hope you did not mind me reformatting your tests.
@wsmoses if @michel2323 does not respond, we might need another reviewer.
Maybe @sethaxen could also help with this rule?

@simsurace simsurace changed the title Fix rule for cholesky Fix rules for cholesky and ldiv! on Cholesky Feb 26, 2024
src/internal_rules.jl Outdated Show resolved Hide resolved
test/internal_rules.jl Outdated Show resolved Hide resolved
@michel2323
Copy link
Collaborator

@michel2323 I think the rules are now fixed. I hope you did not mind me reformatting your tests. @wsmoses if @michel2323 does not respond, we might need another reviewer. Maybe @sethaxen could also help with this rule?

Go ahead. I went over it now. Looks good to me. I never could fully match the ChainRules implementation to this paper. But that may just be me. Thanks for taking a look at this!

@simsurace
Copy link
Contributor Author

I did not look at the paper in detail. Maybe their method is faster than the current implementation, which matches ChainRules.jl. We can always improve things later, in my opinion it's more important to merge a correct rule now.
Maybe some of these rules will even become obsolete when Enzyme capabilities improve to also be able to autodiff through LAPACK routines. Thanks for taking a look!

@simsurace
Copy link
Contributor Author

@wsmoses, I tried to follow your guidance on the fact, let me know if I got it right.
I just noticed that I did not check the forward rule in the regression test, and in fact it also needs to be changed.
Please let me fix that before merging so we get everything right this time.

@simsurace
Copy link
Contributor Author

Ok, the additional forward unit tests for cholesky all fail, documenting there's still something wrong with the forward rule.

@simsurace simsurace requested a review from wsmoses February 27, 2024 13:32
@simsurace
Copy link
Contributor Author

I noticed there are still errors. Please don't merge just yet.

test/internal_rules.jl Outdated Show resolved Hide resolved
test/internal_rules.jl Outdated Show resolved Hide resolved
test/internal_rules.jl Outdated Show resolved Hide resolved
@sethaxen
Copy link
Collaborator

Seems CI is broken on master, but before the most recent commit there were still a number of failures in the tests for these rules on v1.8 and v1.9. Do you know why these are happening?

@simsurace
Copy link
Contributor Author

I could make the other activities work, but I'm not able to make the complex element type work in all cases. Will get back to it later.

@simsurace
Copy link
Contributor Author

simsurace commented May 20, 2024

I think I may have come across a bug or at least a major annoyance:

square(A) = A * adjoin(A)
A = rand(ComplexF64, 5, 5)
ishermitian(square(A)) # true

dA = rand(ComplexF64, 5, 5)
S, dS = autodiff(Forward, square, Duplicated, Duplicated(A, dA))
S  square(A) # true
ishermitian(S) # false

test_forward(square, Duplicated, (A, Duplicated)) # passes

So somehow the forward mode does not produce the same result as the function for square, but this is not caught by EnzymeTestUtils.
This makes the tests fail for complex element types.
I will still commit the test because I believe it is otherwise correct, and open a separate issue for this.

@simsurace simsurace requested a review from sethaxen May 20, 2024 21:42
@simsurace
Copy link
Contributor Author

Seems there are still random failures that I did not see locally. Will try to fix those.

@wsmoses
Copy link
Member

wsmoses commented Jun 6, 2024

hey folks just wanted to check back in here how things were going, and if I can be of any help.

admittedly the cholesky math here is a bit out of my depth

@simsurace
Copy link
Contributor Author

Hi, thanks for checking. I was gonna try to fix those failures but won't have time now for a few weeks.

@wsmoses
Copy link
Member

wsmoses commented Jun 30, 2024

@simsurace the tests seem to work on everything except 1.8.

Do you have an estimate of when you can look at this?

Some of the pieces are blocking for folks so I may cherry-pick out the forward mode fix, in the interim

@simsurace
Copy link
Contributor Author

Hi, thanks for taking care. From next week on I should be able to get back to this.

@wsmoses
Copy link
Member

wsmoses commented Jul 8, 2024

This PR has been partially subsumed by additional support of lapack functions.

In particular we now have lapack support for cholesky solve in forward/reverse, and cholesky ldiv in reverse mode [but not forward]. All of this is limited to real numbers.

This PR still contains useful functionality for complex numbers, as well as testing, but as an FYI @simsurace

@simsurace
Copy link
Contributor Author

Hi @wsmoses, thanks for the heads up. This is cool. I wonder what the best way is to get this PR merged.
I can look at the 1.8 failures. Assuming we can get all tests to pass, do we first merge this PR and then open another one to disable the rules for the cases now covered by the new features?

@wsmoses
Copy link
Member

wsmoses commented Jul 8, 2024

At this point with the new stuff already in place in main, I might actually recommend the other way -- trying to add all these tests and seeing what fails, and then adding back the missing stuff when needed

@wsmoses
Copy link
Member

wsmoses commented Sep 3, 2024

Closing as stale

@wsmoses wsmoses closed this Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants