Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor changes towards GPU compatibility #236

Merged
merged 29 commits into from
Mar 31, 2022
Merged

Minor changes towards GPU compatibility #236

merged 29 commits into from
Mar 31, 2022

Conversation

st--
Copy link
Member

@st-- st-- commented Oct 28, 2021

Replaces Diagonal(Fill(sigma2, n)) with PDMats.ScalMat(n, sigma2), which plays better with CUDA. (And the noise covariance should be positive definite anyways.)

@codecov
Copy link

codecov bot commented Oct 28, 2021

Codecov Report

Merging #236 (9568b1a) into master (eedb50a) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #236      +/-   ##
==========================================
+ Coverage   97.88%   97.90%   +0.01%     
==========================================
  Files          10       10              
  Lines         379      382       +3     
==========================================
+ Hits          371      374       +3     
  Misses          8        8              
Impacted Files Coverage Δ
src/finite_gp_projection.jl 100.00% <100.00%> (ø)
src/mean_function.jl 100.00% <100.00%> (ø)
src/sparse_approximations.jl 100.00% <100.00%> (ø)
src/util/common_covmat_ops.jl 97.61% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eedb50a...9568b1a. Read the comment docs.

@@ -10,7 +10,7 @@ struct ZeroMean{T<:Real} <: MeanFunction end
"""
This is an AbstractGPs-internal workaround for AD issues; ideally we would just extend Base.map
"""
_map_meanfunction(::ZeroMean{T}, x::AbstractVector) where {T} = zeros(T, length(x))
_map_meanfunction(::ZeroMean{T}, x::AbstractVector) where {T} = Zeros{T}(length(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we didn't use FillArrays here before due to AD issues. Great if it is working now.

src/sparse_approximations.jl Outdated Show resolved Hide resolved
@st--
Copy link
Member Author

st-- commented Oct 28, 2021

Any idea why the documentation build might be failing ? It's frustrating that the build output is basically meaningless :/

@devmotion
Copy link
Member

Any idea why the documentation build might be failing ? It's frustrating that the build output is basically meaningless :/

AbstractGPs can't be precompiled. You have to update docs/Manifest.toml since a new dependency was added.

@st--
Copy link
Member Author

st-- commented Oct 29, 2021

AbstractGPs can't be precompiled. You have to update docs/Manifest.toml since a new dependency was added.

Thanks! 🙈 yeah now I remember facing (and resolving!) the same issue before. Is there some way we could make it provide a more helpful error message? 🤔 I both missed the "failed to precompile" bit at the top of the logs, and looking at it I wouldn't have remembered that the solution is to update the Manifest. (Are there other reasons why it might not precompile?)

@st--
Copy link
Member Author

st-- commented Oct 29, 2021

Are there simpler ways to update all the Manifests (docs and all the examples) than going into each directory, executing julia --project=. followed by ] update and waiting a bunch of time for all the precompilation ? Maybe as a single step at least (so I can fire it up and forget for a while)? 🤔 (All I can think of is "let's add a Makefile for that" 😂 or a shell script...)

@st-- st-- requested review from devmotion and willtebbutt November 2, 2021 07:43
Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked if the Diagonal{Fill} PR in FillArrays solves the original issue? Then we could use a Diagonal(Fill(...)) instead of Fill.

I also suggest restricting the type parameter to AbstractMatrix to really enforce the use of matrices. Otherwise I am worried that some old code might return wrong values silently if we don't use vectors in the homoscedastic case anymore (which IMO is a good change).

Can you revert unrelated changes such as removing _symmetric?

src/finite_gp_projection.jl Show resolved Hide resolved
@st--
Copy link
Member Author

st-- commented Nov 2, 2021

Have you checked if the Diagonal{Fill} PR in FillArrays solves the original issue?

Do you mean https://github.com/JuliaArrays/FillArrays.jl/pull/165/files ?
No, I've not checked that yet. I left a note on JuliaGPU/GPUArrays.jl#381.
Update: Having checked now, it doesn't seem to resolve the issue.

If that turned out to resolve the GPU issue, would you prefer to undo the change here (and hope the PR will get merged & released sooner rather than later), or should we stick with the ScalMat anyways (because it should be positive definite) - and perhaps change the other cases over to PDiagMat and PDMat as well? 🤔

Then we could use a Diagonal(Fill(...)) instead of Fill.

We were already using Diagonal(Fill(...)) before, so not sure what you intend here?

I also suggest restricting the type parameter to AbstractMatrix to really enforce the use of matrices. Otherwise I am worried that some old code might return wrong values silently if we don't use vectors in the homoscedastic case anymore (which IMO is a good change).

It's always been matrices, see @willtebbutt's comment.

Can you revert unrelated changes such as removing _symmetric?

I can't revert them per se, it would then require adding a bunch of extra code instead, as I pointed out before and while we could split it into two separate PRs, I'm not convinced why that would be worth the extra effort if after merging the other PR it would look exactly as if we just merged this one as-is.

@devmotion
Copy link
Member

It's always been matrices

The main point is that it was possible to use vectors (even though the default constructors did not use them which I missed initially and @willtebbutt noticed) and that it therefore would be reasonable to restrict the type parameter to AbstractMatrix

I'm not convinced why that would be worth the extra effort if after merging the other PR

One advantage would be that this PR could be viewed as non-breaking without the _symmetric changes (I would argue replacing Diagonal{Fill} with ScalMat can be viewed as non-breaking, but surely this could be considered breaking as well). I am pretty sure that removing _symmetric breaks code of users that could rely on us taking care of tiny numerical asymmetries. Additionally, while I think the reasoning for removing _symmetric is conceptually appealing, I am worried that it will lead to a worse user experience and hence I am not sure if it is actually a good idea and should be done.

@devmotion
Copy link
Member

devmotion commented Nov 3, 2021

If that turned out to resolve the GPU issue, would you prefer to undo the change here (and hope the PR will get merged & released sooner rather than later), or should we stick with the ScalMat anyways (because it should be positive definite) - and perhaps change the other cases over to PDiagMat and PDMat as well?

I think we shouldn't use any of these matrix types actually for now since a) they do not support positive semi-definite matrices (one would have to use e.g. PSDMats) and b) PDMat is implemented very inefficiently. Therefore I think maybe we should also not use ScalMat if Diagonal{Fill} works.

An immediate question is: does this PR break FiniteGP without noise?

@st--
Copy link
Member Author

st-- commented Nov 3, 2021

I think the reasoning for removing _symmetric is conceptually appealing, I am worried that it will lead to a worse user experience and hence I am not sure if it is actually a good idea and should be done.

Whereas I think it will lead to a better user experience... what do we do with that then?

@st--
Copy link
Member Author

st-- commented Nov 3, 2021

I think we shouldn't use any of these matrix types actually for now since a) they do not support positive semi-definite matrices (one would have to use e.g. PSDMats) and b) PDMat is implemented very inefficiently. Therefore I think maybe we should also not use ScalMat if Diagonal{Fill} works.

Well, the Diagonal{Fill} doesn't work. ScalMat(N, 0.0) does actually work though.

@st--
Copy link
Member Author

st-- commented Nov 3, 2021

PDMat is implemented very inefficiently.

What would it take to make it more efficient (or write a more efficient version) ?

@st--
Copy link
Member Author

st-- commented Nov 3, 2021

One advantage would be that this PR could be viewed as non-breaking

I don't think we're at the point yet where we should be too concerned about breaking things... there's way too many things that don't work at all. Once we're at the point where we could honestly recommend this over GaussianProcesses.jl, we should be more careful; for now, I'd rather see us get there faster.

@devmotion
Copy link
Member

Whereas I think it will lead to a better user experience... what do we do with that then?

As I said, I like the suggestion 🙂 I just think there are different consequences (and possibly alternatives) to consider before removing _symmetric. To me this discussion and our disagreement just indicates that it would be good to discuss it (and possibly remove it) in a separate PR.

Well, the Diagonal{Fill} doesn't work.

It does work, GP projections constructed with something like gp(rand(5), 0.0) (i.e., without noise) work fine currently. So it is a valid question if this would be broken by switching to ScalMat since in general PDMats is not concerned with positive semi-definite but only positive definite matrices. I just checked quickly and it seems e.g. cov, var, rand, and logpdf still work 👍 Thus probably only PDMat would not work with positive semi-definite matrices (I assume PDDiagMat would work in the same way as ScalMat since it just computes the inverses of the diagonal elements, resulting in Inf values if a diagonal entry is 0).

What would it take to make it more efficient (or write a more efficient version) ?

I assume one would have to define a new AbstractPDMat type that does not always compute the full matrix. (xref: JuliaStats/PDMats.jl#140)

I don't think we're at the point yet where we should be too concerned about breaking things... there's way too many things that don't work at all. Once we're at the point where we could honestly recommend this over GaussianProcesses.jl, we should be more careful; for now, I'd rather see us get there faster.

I am not worried about making a breaking release, we have to do this anyway from time to time since the API and types still change. And I think it's fine. But there's still an advantage of non-breaking releases - downstream packages and users will receive the new version and hence the new features automatically without any delay and without having to update compat bounds. Whereas for breaking releases they have to update the compat bounds, make a new release, etc. - which is fine and usually not a problem but takes time.

IMO AbstractGPs also already works quite well and for many things it should be possible to use it instead of GaussianProcesses.jl. I think also one major advantage of AbstractGPs is that one can choose from the variety and combinations of kernels in KernelFunctions, whereas the design of kernels in GaussianProcesses.jl always seemed a bit suboptimal to me (note: I worked on it and tried to improve it a while back).

@st--
Copy link
Member Author

st-- commented Mar 30, 2022

Maybe it would be useful to convert Diagonal(Fill(...)) to a ScalMat in thw constructor. Then the internal implementation could be simplified since we would only have to handle ScalMat.

I'm not quite sure what you mean; we still support Diagonal(vec) (different noise at each observation), I don't understand neither what exactly you're suggesting nor what/why we're gaining from it...

@st-- st-- requested a review from devmotion March 30, 2022 11:25
@devmotion
Copy link
Member

Well, the suggestion was not to drop support for Diagonal(Fill(...)) but simply to convert them to ScalMat when a FiniteGP is constructed. The main advantage is that Diagonal(Fill(...)) would hit the same internal optimizations as ScalMat but we only have to implement them for ScalMat. Similar to eg observations as Matrix but internally we only have to deal with vectors.

Project.toml Show resolved Hide resolved
@willtebbutt
Copy link
Member

Well, the suggestion was not to drop support for Diagonal(Fill(...)) but simply to convert them to ScalMat when a FiniteGP is constructed. The main advantage is that Diagonal(Fill(...)) would hit the same internal optimizations as ScalMat but we only have to implement them for ScalMat. Similar to eg observations as Matrix but internally we only have to deal with vectors.

I'm not sure that this is necessary. The standard way to obtain a Diagonal(Fill(...)) as the covariance matrix is to just pass in a scalar. i.e. to write f(x, 0.1), rather than f(x, Diagonal(Fill(0.1, length(x)))).

@st--
Copy link
Member Author

st-- commented Mar 30, 2022

@devmotion I've removed the type restriction; I still believe it's just enshrining at the constructor level what was already required by the code everywhere else, but someone else can sort that out separately. That should make sure this is no longer breaking.

If you want to add an additional outer constructor for Diagonal{<:Fill}, please do so in a separate PR; as Will pointed out I don't think it's really necessary, and I'd rather just get this PR merged in instead of figuring out how to get the details of that right.

Anything else that's in the way of being able to merge this ?

@st-- st-- requested a review from devmotion March 30, 2022 12:19
@devmotion
Copy link
Member

A type change is breaking, eg it is not possible anymore to use UniformScaling which supported by at least parts of the code it seems (can't check at a computer right now).

Another possibility would be even to only support matrices as noise but no scalars or vectors. This was changed this way eg in MvNormal since it was always unclear to users if scalars and vectors represent variances or standard deviations in MvNormal whereas matrices are always covariance matrices. If we expect users to provide matrices it would be a stronger argument for handling Diagonal(Fill(...)) explicitly.

I'll try to have another look at the PR later today and check if anything is missing.


Slightly off-topic, I think it's off-putting if I feel to be pushed into reviewing as fast as possible (even if this was not your intention it is how I read your comment; but as almost everyone, I do this in my limited spare time) and if it is demanded if suggestions within the scope of the PR should be implemented by myself (sometimes this might be useful but even if something is considered to be out of scope - by the people involved in the discussion - I don't think it's the reviewer's job to fix such outstanding issues in follow-up PRs).

@willtebbutt
Copy link
Member

willtebbutt commented Mar 30, 2022

A type change is breaking, eg it is not possible anymore to use UniformScaling which supported by at least parts of the code it seems (can't check at a computer right now).

This is not correct. Code written for the FiniteGP type does not support UniformScaling scalars / vectors as the observation noise covariance. If such types are passed, outer constructors are used to convert them into appropriate AbstractMatrix objects. The lack of a type constraint is an omission, and therefore a bug. For example, if you take a look through this code, you'll see that there are a lot of things that are done that are only safe to do with AbstractMatrixs. For example this.

I agree that in general, a type change is breaking, but this is not the case if it should not have been allowed in the first place, and can reasonably be considered a bug. I believe that is the case here.

@st--
Copy link
Member Author

st-- commented Mar 30, 2022

Slightly off-topic, I think it's off-putting if I feel to be pushed into reviewing as fast as possible (even if this was not your intention it is how I read your comment; but as almost everyone, I do this in my limited spare time)

I did not mean to push you into reviewing faster, I would also be happy for you to declare that you don't have time to review it, and to let someone else do it instead. It's just that by requesting changes you give the impression that you are keen to review it, with other people then leaving the reviewing to you, so thereby you're blocking the PR from improving the status quo.

The bar for a PR to be merged should not be "it's perfect", just that it improves things. I'm also putting my limited spare time into this, and what is off-putting to me is if I feel like anything I try to improve isn't good enough until it's perfect.

if it is demanded if suggestions within the scope of the PR should be implemented by myself (sometimes this might be useful but even if something is considered to be out of scope - by the people involved in the discussion - I don't think it's the reviewer's job to fix such outstanding issues in follow-up PRs).

Of course it doesn't have to be you to be the one to implement it. You seemed keen to have it in place, and you gave the impression that you knew just what needed doing. I just wanted to suggest that this is an additional change on top of what's already in this PR (so nothing that "needs to be fixed", which would make it any worse than it is right now), and that I don't think this PR should be held up by adding your proposal. We could of course just open an issue to keep track of it instead!

@devmotion
Copy link
Member

This is not correct. Code written for the FiniteGP type does not support UniformScaling scalars / vectors as the observation noise covariance. If such types are passed, outer constructors are used to convert them into appropriate AbstractMatrix objects. The lack of a type constraint is an omission, and therefore a bug. For example, if you take a look through this code, you'll see that there are a lot of things that are done that are only safe to do with AbstractMatrixs. For example this.

I just checked and UniformScaling is supported and can be used at least with some methods (mean_and_var errors since diagind is not defined):

julia> using AbstractGPs, LinearAlgebra


julia> fx = GP(GaussianKernel())(rand(10), 2.0*I)
AbstractGPs.FiniteGP{GP{AbstractGPs.ZeroMean{Float64}, SqExponentialKernel{Distances.Euclidean}}, Vector{Float64}, UniformScaling{Float64}}(
f: GP{AbstractGPs.ZeroMean{Float64}, SqExponentialKernel{Distances.Euclidean}}(AbstractGPs.ZeroMean{Float64}(), Squared Exponential Kernel (metric = Distances.Euclidean(0.0)))
x: [0.9001122128843618, 0.5014532689115688, 0.9068485710635356, 0.8671710704251859, 0.1593165289491113, 0.033849226130907684, 0.6012613832007633, 0.6792056597108606, 0.6291170664768985, 0.4586633196418336]
Σy: UniformScaling{Float64}(2.0)
)


julia> mean(fx)
10-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

julia> mean_and_cov(fx)
([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [3.0 0.9236108289936744  0.9639467883634513 0.9071580732113963; 0.9236108289936744 3.0  0.9918840906723075 0.9990849290537535;  ; 0.9639467883634513 0.9918840906723075  3.0 0.9855777713218073; 0.9071580732113963 0.9990849290537535  0.9855777713218073 3.0])

julia> rand(fx)
10-element Vector{Float64}:
  3.7182207380247623
  1.0497036329028469
  0.9905553822300871
 -0.44238892519088857
 -1.0465937939874175
 -0.8895511203455914
  1.0418168460349009
 -0.20439667340116374
  1.2400217065941217
  0.12529989634261757

@willtebbutt
Copy link
Member

Okay, that's interesting. Since @st-- has removed it from this PR, I think we should continue this discussion elsewhere, and let this PR get merged.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@devmotion
Copy link
Member

I did not mean to push you into reviewing faster, I would also be happy for you to declare that you don't have time to review it, and to let someone else do it instead. It's just that by requesting changes you give the impression that you are keen to review it, with other people then leaving the reviewing to you, so thereby you're blocking the PR from improving the status quo.

I'm completely fine with reviewing it as you can see. But IMO reviewing within < 2 days is fast and usually there is no good reason to expect a faster review cycle. It's completely fine to ping me after a few days without review/response (sometimes I just forget stuff), but pinging immediately with a comment about not holding back the PR doesn't feel appropriate to me. I guess different people have different opinions on this matter though 🤷

The bar for a PR to be merged should not be "it's perfect", just that it improves things.

Sure, nothing is ever perfect. But removing breaking changes, fixing missing compat entries, discussing possible constructors to not miss out on newly added optimizations etc. are all reasonable things to suggest and discuss, and not something I do to hold back the PR.

We could of course just open an issue to keep track of it instead!

Yeah, I think usually this would be the way to go. Then anyone who is keen and has time can fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants