-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Known AD Failures #116
Comments
The Zygote problems with MaternKernel are caused by the fact that the partial derivative of |
Haha writing these derivatives sounds like one should write a whole package about bessel functions |
@sharanry Can you prioritise these AD issues? It would be great if these issues can be addressed during the summer. |
BTW I found some publication from 2016 with closed-form expressions of the derivatives of the Bessel functions with respect to the order. I opened an issue at JuliaDiff/ChainRules.jl#208 to discuss how one would deal with the additional dependencies needed for their implementations (they contain hypergeometric functions). |
We might want to refactor KernelSum and KernelProd (making them concretely typed and allowing both tuples and vectors of kernels similar to TensorProduct, and probably removing the weights in KernelSum) before fixing any AD issues there. |
Agreed! |
Sorry for the late reply. I somehow didn't get a notification this comment. Randomly found this while browsing the issues. I am looking into it. |
The probable reason Zygote fails for KernelFunctions.jl/src/transform/functiontransform.jl Lines 19 to 20 in 0df9e83
julia> Zygote._pullback(x-> mapslices(x->sin.(x), x, dims=1), rand(3,3))[2](ones(3,3))
ERROR: Mutating arrays is not supported |
|
Oh makes sense. Do you see any other efficient way to apply a function transform for a matrix/ColVecs/RowVecs? |
I see the following possibilities here:
function _map(t::FunctionTransform, x::ColVecs)
vals = map(axes(x.X, 2)) do i
t.f(view(x.X, :, i))
end
return ColVecs(vals)
end (Zygote should support this automatically) |
I just ran a quick benchmark for one other possibility which would require us to define adjoint for a generator. The methods you mentioned are probably better. julia> @btime hcat(map(x->sin.(x), (eachslice(rand(1000,1000); dims=1)))...)
16.586 ms (2015 allocations: 23.09 MiB)
julia> @btime mapslices(x->sin.(x), rand(1000,1000); dims=1)
12.189 ms (7505 allocations: 23.18 MiB) |
A bit off topic, but splatting probably impacts performance quite a bit, so probably it would b better to use |
Thanks! Wasn't aware of this. julia> @btime mapslices(x->sin.(x), $(rand(1000,1000)); dims=1);
10.581 ms (7503 allocations: 15.55 MiB)
julia> @btime mapreduce(x->sin.(x), hcat, eachslice($(rand(1000,1000)); dims=1));
914.970 ms (5002 allocations: 3.74 GiB) |
Shouldn't you use |
I don't think it is making much difference performance wise at least. julia> @btime mapreduce(x->sin.(x), hcat, eachslice($(rand(1000,1000)); dims=2));
952.564 ms (5002 allocations: 3.74 GiB) |
Can you check if the function is typestable? I suspect it might not, which would explain the number of allocations. The problem might be that it returns a different type if |
Just for the record -- we should be using ChainRulesCore to define pullbacks for Plans are in the works to transfer both |
I can't figure out how to define |
KernelFunctions doesn't use |
I see. I checked the source of the last release to backport |
The latest releases don't use |
Regarding FBMKernel not working with |
You can use JuliaDiff/ForwardDiff.jl#451 if you do not want to edit the source code. |
Hi--it just occurs to me to share this here, but I recently finished a project for computing derivatives of I'm not sure how helpful this is because the derivatives are at present pretty Anyways, just writing here in case it is helpful. |
I came across https://www.tandfonline.com/doi/pdf/10.1080/10652469.2016.1164156 a while ago, it contains closed-form expressions of the derivatives using e.g. hypergeometric functions. In principle these could be used with other AD backends as well but I don't know if there are any numerical problems, how slow/fast the evaluation with HypergeometricFunctions would be, and if (I assume not since it would introduce a circular dependency) SpecialFunctions would take a dependency on HypergeometricFunctions. |
I also saw that paper and was interested in just using that before undertaking a more from-scratch approach. But there are a few challenges with using the representations in Santander. For one, as you point out, evaluating the generalized hypergeometric functions like 3F4 and 2F3 is a task of comparable difficulty. I love Our project was enough of a hassle that we ended up writing a paper about it, and almost all the work was in handling the problems of I've actually thought about asking the In any case, just posting here for your consideration. If somebody manages to implement them with exact expressions in a way that is tolerably fast and handles those edge cases, I'll be the first person to celebrate. In the mean time, though, I wouldn't be shocked if zygote compatibility was possible. I just really don't know enough to conjecture about how much of a project it would be. |
Here is a list of the failures in the tests made in #114 I observed with the different ADs : ForwardDiff.jl, Zygote.jl and ReverseDiff.jl :
PeriodicKernel
does not work with AD (see issue for work-around) #389This is a good starting point to try to find solutions
The text was updated successfully, but these errors were encountered: