Transfer learning; using mean_module and covar_module from GP1 as the mean_module and covar_module in new GP2 #2294

neildhir · 2024-04-12T01:05:54Z

neildhir
Apr 12, 2024

Full disclosure I posted this issue on the GPyTorch discussion as well, I am not sure where it is most appropriate. But I'll post it here too.

Issue description

I have a fairly simple problem but it is not working. I would like to use the mean and covariance functions from a GP trained on one dataset, as the mean and covariance functions for another GP trained on a different dataset (both datasets, however, are from the same domain). The problem is meant to be part of a Bayesian optimisation framework that I am exploring but the issue seems to be on the GPyTorch side (with the custom kernel) rather than BOTorch (but from where I am using some functions for ease of demonstration).

A full example follows below with the error message that I am getting as well. Help would be most welcome as I cannot understand where I am going wrong.

Worked example (which is not working)

import torch

from botorch.models.transforms import Normalize, Standardize
from botorch.models import SingleTaskGP
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch.fit import fit_gpytorch_mll
from gpytorch.kernels import MaternKernel, ScaleKernel, Kernel

import math

We setup all the necessary machinery for training the first GP.

f = lambda x: torch.sin(math.pi * x)
# Problem set up
d = 1
bounds = torch.tensor([[-4.0], [4.0]])
# We use 20 datapoints for the set dataset
train_X1 = torch.linspace(bounds[0].item(), bounds[1].item(), 20).unsqueeze(-1).to(torch.float64)
train_Y1 = f(train_X1)
train_Y1 += 0.1 * torch.randn_like(train_Y1)
# Build first GP model
gp1 = SingleTaskGP(train_X1, train_Y1, input_transform=Normalize(d,bounds=bounds), outcome_transform=Standardize(m=1))
mll = ExactMarginalLogLikelihood(gp1.likelihood, gp1)
fit_gpytorch_mll(mll)

As ever BOTorch is just using GPyTorch under the hood for most of this. Now then we come to the second part where we want to use gp1.mean_module and gp1.covar_module in a new GP. This requires a new kernel.

class MyKernel(Kernel):
    has_lengthscale = True
    is_stationary = True
    def __init__(self, gp1_covar_module, **kwargs):
        super(MyKernel, self).__init__(**kwargs)
        self.gp1_covar_module = gp1_covar_module
        self.scaled_matern_kernel = ScaleKernel(MaternKernel())

    def forward(self, x1, x2, **params):
        A = self.scaled_matern_kernel(x1, x2)
        # Compute standard deviation using the first GP's covariance module
        sigma = self.gp1_covar_module(x1, x2).diag().sqrt()
        B = torch.outer(sigma, sigma)
        assert A.shape == B.shape, f"Shapes do not match: {A.shape} and {B.shape}"
        return A + B

Now we use this kernel in a new GP along with the mean function of the first GP as well.

# For this GP we simulate data sparsity by only using three datapoints
train_X2 = torch.tensor([-2.,0.,1.]).unsqueeze(-1).to(torch.float64)
train_Y2 = f(train_X2)
train_Y2 += 0.1 * torch.randn_like(train_Y2)
gp2 = SingleTaskGP(train_X2, train_Y2,
                   outcome_transform=Standardize(m=1),
                   input_transform=Normalize(d=train_X2.shape[1],bounds=bounds),
                   mean_module=gp1.mean_module,
                   covar_module=MyKernel(gp1.covar_module))
mll = ExactMarginalLogLikelihood(gp2.likelihood, gp2)
fit_gpytorch_mll(mll)

If we now try to evaluate evaluate the posterior over the whole domain

eval_X = torch.linspace(bounds[0].item(), bounds[1].item(), 200).unsqueeze(-1)
gp2.posterior(eval_X)

we run into an error.

[1404](python3.10/site-packages/linear_operator/operators/_linear_operator.py:1404)     raise NotImplementedError(
   [1405](python3.10/site-packages/linear_operator/operators/_linear_operator.py:1405)         "LinearOperator#diagonal is only implemented for when :attr:`dim1` and :attr:`dim2` are equal "
   [1406](python3.10/site-packages/linear_operator/operators/_linear_operator.py:1406)         "to -2 and -1, respectfully, and :attr:`offset = 0`. "
   [1407](python3.10/site-packages/linear_operator/operators/_linear_operator.py:1407)         f"Got: offset={offset}, dim1={dim1}, dim2={dim2}."
   [1408](python3.10/site-packages/linear_operator/operators/_linear_operator.py:1408)     )
   [1409](python3.10/site-packages/linear_operator/operators/_linear_operator.py:1409) elif not self.is_square:
-> [1410](python3.10/site-packages/linear_operator/operators/_linear_operator.py:1410)     raise RuntimeError("LinearOperator#diagonal is only implemented for square operators.")
   [1411](python3.10/site-packages/linear_operator/operators/_linear_operator.py:1411) return self._diagonal()

RuntimeError: LinearOperator#diagonal is only implemented for square operators.

This seems like a fairly trivial setup to me but evidently, I have gone wrong somewhere. Help would be most welcome. Something is going wrong with this line sigma = self.gp1_covar_module(x1, x2).diag().sqrt() - the covariance matrix is not square but I do not understand why its not. The purpose is to use the variance of the fitted GP1 to enlarge the variance of areas of GP2 where there is no data (as GP2 as there is much less data for GP2).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer learning; using mean_module and covar_module from GP1 as the mean_module and covar_module in new GP2 #2294

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Transfer learning; using mean_module and covar_module from GP1 as the mean_module and covar_module in new GP2 #2294

neildhir Apr 12, 2024

Issue description

Worked example (which is not working)

Replies: 0 comments

neildhir
Apr 12, 2024