Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rules for det and logdet of Cholesky #613

Merged
merged 9 commits into from
May 18, 2022
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "ChainRules"
uuid = "082447d4-558c-5d27-93f4-14fc19e9eca2"
version = "1.29.0"
version = "1.30.0"

[deps]
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
Expand Down
4 changes: 2 additions & 2 deletions src/rulesets/LinearAlgebra/dense.jl
Original file line number Diff line number Diff line change
Expand Up @@ -118,15 +118,15 @@ end
##### `det`
#####

function frule((_, Δx), ::typeof(det), x::AbstractMatrix)
function frule((_, Δx), ::typeof(det), x::StridedMatrix{<:Number})
Ω = det(x)
# TODO Performance optimization: probably there is an efficent
# way to compute this trace without during the full compution within
return Ω, Ω * tr(x \ Δx)
end
frule((_, Δx), ::typeof(det), x::Number) = (det(x), Δx)

function rrule(::typeof(det), x::Union{Number, AbstractMatrix})
function rrule(::typeof(det), x::Union{Number, StridedMatrix{<:Number}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these changes could be split out of this PR then we could meged this much faster

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted these changes.

Ω = det(x)
function det_pullback(ΔΩ)
∂x = x isa Number ? ΔΩ : inv(x)' * dot(Ω, ΔΩ)
Expand Down
21 changes: 21 additions & 0 deletions src/rulesets/LinearAlgebra/factorization.jl
Original file line number Diff line number Diff line change
Expand Up @@ -551,3 +551,24 @@ function rrule(::typeof(getproperty), F::T, x::Symbol) where {T <: Cholesky}
end
return getproperty(F, x), getproperty_cholesky_pullback
end

# `det` and `logdet` for `Cholesky`
function rrule(::typeof(det), C::Cholesky)
y = det(C)
s = conj!((2 * y) ./ _diag_view(C.factors))
function det_Cholesky_pullback(ȳ)
ΔC = Tangent{typeof(C)}(; factors=Diagonal(ȳ .* s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this be better? There's one fewer allocation, and we don't need to assume that s is mutable.

Suggested change
s = conj!((2 * y) ./ _diag_view(C.factors))
function det_Cholesky_pullback(ȳ)
ΔC = Tangent{typeof(C)}(; factors=Diagonal(ȳ .* s))
diagF = _diag_view(C.factors)
function det_Cholesky_pullback(ȳ)
ΔC = Tangent{typeof(C)}(; factors=Diagonal(2(* conj(y)) ./ conj.(diagF)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the determinant is 0 (can happen if check=false is passed to cholesky), this will inject NaNs, even if the cotangent is 0. Since we try to treat cotangents as strong zeros, it would be nice to handle this case by ensuring that such NaNs end up as zeros.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one fewer allocation, and we don't need to assume that s is mutable.

Seems like an improvement to me, thanks! 🙂

There's a whitespace missing in the first line of your suggestion it seem:

Suggested change
s = conj!((2 * y) ./ _diag_view(C.factors))
function det_Cholesky_pullback(ȳ)
ΔC = Tangent{typeof(C)}(; factors=Diagonal(ȳ .* s))
diagF = _diag_view(C.factors)
function det_Cholesky_pullback(ȳ)
ΔC = Tangent{typeof(C)}(; factors=Diagonal(2(* conj(y)) ./ conj.(diagF)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the determinant is 0

It would happen if y = 0 (the determinant) but also if ȳ = 0. Should we care about the last case as well? Or is it correct to return NaN there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, never mind, of course, at least one element of diagF is zero iffy = 0. I.e., we only have to care about y = 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even a bit clearer (without having to know about precedence of operators):

Suggested change
s = conj!((2 * y) ./ _diag_view(C.factors))
function det_Cholesky_pullback(ȳ)
ΔC = Tangent{typeof(C)}(; factors=Diagonal(ȳ .* s))
diagF = _diag_view(C.factors)
function det_Cholesky_pullback(ȳ)
ΔC = Tangent{typeof(C)}(; factors=Diagonal( (2 * (* conj(y))) ./ conj.(diagF)))

Copy link
Member Author

@devmotion devmotion May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess something like the following could work?

# compute `x / conj(y)`, handling `x = y = 0`
function _x_divide_conj_y(x, y)
    z = x / conj(y)
    # in our case `iszero(x)` implies `iszero(y)`
    return iszero(x) ? zero(z) : z
end
function rrule(::typeof(det), C::Cholesky)
    y = det(C)
    diagF = _diag_view(C.factors)
    function det_Cholesky_pullback(ȳ)
        ΔC = Tangent{typeof(C)}(; factors=Diagonal(_x_divide_conj_y.(2 ** conj(y), diagF)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethaxen I updated the PR and added tests for singular matrices.

return NoTangent(), ΔC
end
return y, det_Cholesky_pullback
end

function rrule(::typeof(logdet), C::Cholesky)
y = logdet(C)
s = conj!((2 * one(eltype(C))) ./ _diag_view(C.factors))
function logdet_Cholesky_pullback(ȳ)
ΔC = Tangent{typeof(C)}(; factors=Diagonal(ȳ .* s))
devmotion marked this conversation as resolved.
Show resolved Hide resolved
return NoTangent(), ΔC
end
return y, logdet_Cholesky_pullback
end
19 changes: 19 additions & 0 deletions test/rulesets/LinearAlgebra/factorization.jl
Original file line number Diff line number Diff line change
Expand Up @@ -432,5 +432,24 @@ end
ΔX_symmetric = chol_back_sym(Δ)[2]
@test sym_back(ΔX_symmetric)[2] ≈ dX_pullback(Δ)[2]
end

@testset "det and logdet (uplo=$p)" for p in (:U, :L)
@testset "$op" for op in (det, logdet)
@testset "$T" for T in (Float64, ComplexF64)
n = 5
# rand (not randn) so det will be postive, so logdet will be defined
A = 3 * rand(T, (n, n))
X = Cholesky(A * A' + I, p, 0)
X̄_acc = Tangent{typeof(X)}(; factors=Diagonal(randn(T, n))) # sensitivity is always a diagonal
test_rrule(op, X ⊢ X̄_acc)

# return type
_, op_pullback = rrule(op, X)
X̄ = op_pullback(2.7)[2]
@test X̄ isa Tangent{<:Cholesky}
@test X̄.factors isa Diagonal
end
end
end
end
end