Skip to content

Code for Deep Structured Mixtures of Gaussian Processes (DSMGPs)

Notifications You must be signed in to change notification settings

trappmartin/DeepStructuredMixtures

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Deep Mixtures of Gaussian Processes

This package implements Deep Structured Mixtures of Gaussian Processes (DSMGP) [1] in Julia 1.3.

Installation

To use this package you need Julia 1.3 installed on your machine.

Inside the Julia REPL, you can install the package in the Pkg mode (type ] in the REPL):

pkg> add https://github.com/trappmartin/DeepStructuredMixtures

After the installation you can load the package using:

using DeepStructuredMixtures

Note that the package will be compiled the first time you load it.

Python bridge

The package can be used from python using the excelent pyjulia package: https://github.com/JuliaPy/pyjulia

Usage

The following example explains the usage of DeepStructuredMixtures. Note that this example assume that you have Plots installed in your Julia environment.

First, we load the necessary libraries:

using Plots
using DeepStructuredMixtures
using Random

Now we can create some synthetic data or load some real data:

xtrain = collect(range(0, stop=1, length = 100))
ytrain = sin.(xtrain*4*pi + randn(100)*0.2)

We will now use a squared exponential kernel-function with a constant mean-function to fit the DSMGP. See API for more options.

kernelf = IsoSE(1.0, 1.0)
meanf = ConstMean(mean(xtrain))

Now we can construct a DSMGP on our data and find optimial hyperparameters.

K = 4 # Number of splits per product node
V = 3 # Number of children per sum node
M = 10 # Minimum number of observations per expert

model = buildDSMGP(reshape(xtrain,:,1), ytrain, V, K; M = M, kernel = kernelf, meanFun = meanf)
train!(model, ADAM())

# finally we perfom exact posterior infence
update!(model)

Note that for large data sets it is recommended to train the DSMGP with V = 1 and use the hyper-parameters to initialise the training of a model with V > 1:

model1 = buildDSMGP(reshape(xtrain,:,1), ytrain, 1, V; M = M, kernel = kernelf, meanFun = meanf)
train!(model1, ADAM())

# get hyper-parameters
hyp = reduce(vcat, params(leftGP(model1.root), logscale=true))

model = buildDSMGP(reshape(xtrain,:,1), ytrain, K, V; M = M, kernel = kernelf, meanFun = meanf)

# set hyper-parameters instead of learning from scratch
setparams!(model.root, hyp)
train!(model, ADAM(), randinit = false)

Finally, we can plot the model:

plot(model)

and use it for predictions:

xtest = collect(range(0.5, stop=1.5, length = 100))
m, s = predict(model, reshape(xtest,:,1))

Note that all methods assume that xtrain and xtest are matrices, which is why we use reshape(xtest,:,1) to reshape the respective vectors to a matrix.

API

Mean functions

# A constant mean of zero aka zero-mean function.
ConstMean(0.0) 

Kernel functions

# A squared exponential kernel-function with lengthscale 1 and std of 1.
IsoSE(1.0, 1.0)

# A squared exponential kernel-function with ARD and lengthscales of 1 and std of 1.
ArdSE(ones(10), 1.0)

# A linear kernel-function with lengthscale of 1.
IsoLinear(1.0)

# A linear kernel-function with ARD and lengthscales of 1.
ArdLinear(ones(10))

# Composition of kernel-function for inference over kernel-functions.
KernelFunction[IsoSE(1.0, 1.0), IsoLinear(1.0)] 

Models

# An exact Gaussian process
GaussianProcess(trainx, trainy, mean = meanf, kernel = kernelf)

# A (generalized) product of experts (PoE) model with K splits per node and a miminum of M observations per expert
buildPoE(trainx, trainy, K; generalized = true, M = M, kernel = kernelf, meanFun = meanf)

# A (robust) Bayesian comittee machine (BCM) model with K splits per node and a miminum of M observations per expert
# ! Training not implemented !
buildrBCM(x, y, K; M = M, kernel = kernelf, meanFun = meanf)

# A deep structured mixture of GPs (DSMGP) model with K splits per product node, V children per sum node and a miminum of M observations per expert.
buildDSMGP(x, y, V, K; M = M, kernel = kernelf, meanFun = meanf)

Training

Note that DeepStructuredMixtures reexports Flux.jl and uses the optimisers available in Flux. We refer to the Flux.jl documentation of the available optimisers.

# train a model for 1000 iterations using RMSProp
train!(model, ADAM(), iterations = 1_000)

# fine-tune a model for 1000 iterations using RMSProp
finetune!(model, ADAM(), iterations = 1_000)

# fit the posterior of a hierarchical model, e.g. gPoE
fit_naive!(model.root)

# fit the posterior of a DSMGP using shared Cholesky
fit!(model)

Prediction

# make predictions using a model, i.e., compute mean (s) and variance (s).
m, s = prediction(model, testx)

# plot a model and the training data.
plot(model)

Reference

[1] Martin Trapp, Robert Peharz, Franz Pernkopf and Carl Edward Rasmussen: Deep Structured Mixtures of Gaussian Processes. To appear at the International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.

Acknowledgments

This project received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant Agreement No. 797223 (HYBSPN).