FEA: Parallel Partial Emulation (aka Multitask GPs with a shared kernel) #2470
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull requests adds parallel partial emulation (see Gu and Berger, 2016). The idea of this method is that, in a multi-output (or multitask) setting, each task is allowed to have a different mean and variance, but all tasks share common Gaussian process correlation parameters, which are estimated from the joint likelihood.
This is useful when constructing Gaussian process surrogate for a computer model with massive amounts of outputs. Think, for example, of a finite element model for an engineering or science problem, where the inputs are model parameters and the outputs are the model predictions at a large number of space and/or time coordinates.
I think this is a setting currently not yet covered by
gpytorch
:rank=0
) scale poorly with the number or outputs/tasks (see below)This PR should cover that gap. Feedback and suggestions are welcome!
Code changes
I've added a new
ParallelPartialKernel
, that acts as a drop-in replacement forMultitaskKernel
and implements the parallel partial emulation strategy from Gu and Berger, 2016).Tests
See
test/examples/test_parallel_partial_gp_regression.py
.Documentation
Has been updated accordingly.
Examples
See this notebook for an example. I've also added this file to the documentation.
Comparison to existing methods
Ignoring the inter-task correlations leads to a (much) faster method. This notebook compares the cost of evaluating the posterior with both Multitask GP and Parallel Partial GP regression as a function of the number of tasks / number of outputs. As the picture below illustrates, the multitask GP method would be infeasible in applications where the number of outputs is large (say, more than several hundreds or thousands).
The method is also faster than a batch-independent GP construction (see this notebook) and has the additional benefit that only one set of kernel parameters needs to be trained (instead of
num_tasks
sets of parameters).