Solved – How to implement a 2-D Gaussian Processes Regression through GPML (MATLAB)

gaussian processMATLABnormal distributionregression

I just touched Gaussian processes two weeks ago. I am not very familiar with the selection of a model and its hyperparameters. Here is the demo code that I run for a 2-D Gaussian processes regression. Its output is not what I expected.

% produce the training set for regression.
% Here, the regression target Y is the sum of input.
X1train = linspace(-4.5,4.5,10);
X2train = linspace(-4.5,4.5,10);
X = [X1train' X2train'];
Y = sum(X,2);

% produce the test set for regression
Xtest = [1 2];

% set the hyperparameters
covfunc = {@covMaterniso, 3};
ell = 1/4; sf = 1;
hyp.cov = log([ell; sf]);
likfunc = @likGauss;
sn = 0.1;
hyp.lik = log(sn);

% implement regression
[ymu ys2 fmu fs2] = gp(hyp, @infExact, [], covfunc, likfunc, X, Y, Xtest);

The result is: ymu = 0.131695275851991, but what I expected is ymu = 1 + 2 = 3

Best Answer

This seems to be a modelling problem rather than an implementation problem. Let us plot the predictions (ymu) from your model and training data with more points:

%More testdata
[Xtest1, Xtest2] = meshgrid(-4.5:0.1:4.5, -4.5:0.1:4.5);
 Xtest = [Xtest1(:) Xtest2(:)];

% implement regression 
[ymu ys2 fmu fs2] = gp(hyp, @infExact, [], covfunc, likfunc, X, Y, Xtest);

pcolor(Xtest1,Xtest2, reshape(ymu,size(Xtest1)))
shading('flat')

Model prediction with length-scale parameter 0.25

The axes of the image represent the inputs Xtest1,Xtest2, and the color is the predicted mean. We see that the model output deviates from 0 only near the training points. This is due to the short length scale hyperparameter (ell) that causes the model to learn only short range phenomena. Let us try modifying that (all other code kept unchanged)

ell = 4

Model output with lengthscale parameter 4

Now the predicted output is much closer to your ground truth. Indeed, the prediction at $(1,2)$ is now $2.9$. It seems that (with the training data kept fixed), the longer the length-scale, the closer the prediction at your testpoint gets to 3.

Another observation is that all your training data are located at the $x_1=x_2$ line, so it is rather optimistic to hope that any model would learn what happens outside that line without making strong assumptions. Besides changing the hyperparameters, you could also play with adding training data at various other points, e.g., in a 2-dimensional grid.

Related Solutions

Solved – How to correctly use the GPML Matlab code for an actual (non-demo) problem

The GP does a good job for your problem's training data. However, it's not so great for the test data. You've probably already ran something like the following yourself:

load('../XYdata_01_01_ab.mat');

for N = 1 : 25
    % normalize
    m = mean(Y1(N,:));
    s = std(Y1(N,:));
    Y1(N,:) = 1/s * (Y1(N,:) - m);
    Y2(N,:) = 1/s * (Y2(N,:) - m);

    covfunc = @covSEiso;
    ell = 2;
    sf = 1;
    hyp.cov = [ log(ell); log(sf)];

    likfunc = @likGauss;
    sn = 1;
    hyp.lik = log(sn);

    hyp = minimize(hyp, @gp, -100, @infExact, [], covfunc, likfunc, X1', Y1(N,:)');
    [m s2] = gp(hyp, @infExact, [], covfunc, likfunc, X1', Y1(N,:)', X1');    
    figure;    
    subplot(2,1,1); hold on;    
    title(['N = ' num2str(N)]);    
    f = [m+2*sqrt(s2); flipdim(m-2*sqrt(s2),1)];
    x = [1:length(m)];
    fill([x'; flipdim(x',1)], f, [7 7 7]/8);
    plot(Y1(N,:)', 'b');
    plot(m, 'r');
    mse_train = mse(Y1(N,:)' - m);

    [m s2] = gp(hyp, @infExact, [], covfunc, likfunc, X1', Y1(N,:)', X2');
    subplot(2,1,2); hold on;
    f = [m+2*sqrt(s2); flipdim(m-2*sqrt(s2),1)];
    x = [1:length(m)];
    fill([x'; flipdim(x',1)], f, [7 7 7]/8);    
    plot(Y2(N,:)', 'b');
    plot(m, 'r');
    mse_test = mse(Y2(N,:)' - m);

    disp(sprintf('N = %d -- train = %5.2f   test = %5.2f', N, mse_train, mse_test));
end

Tuning the hyperparameters manually and not using the minimize function it is possible to balance the train and test error somewhat, but tuning the method by looking at the test error is not what you're supposed to do. I think what's happening is heavy overfitting to your three subjects that generated the training data. No method will out-of-the-box do a good job here, and how could it? You provide the training data, so the method tries to get as good as possible on the training data without overfitting. And it fact, it doesn't overfit in the classical sense. It doesn't overfit to the data, but it overfits to the three training subjects. E.g., cross-validating with the training set would tell us that there's no overfitting. Still, your test set will be explained poorly.

What you can do is:

Get data from more subjects for training. This way your fourth person will be less likely to look like an "outlier" as it does currently. Also, you have just one sequence of each person, right? Maybe it would help to record the sequence multiple times.
Somehow incorporate prior knowledge about your task that would keep a method from overfitting to specific subjects. In a GP that could be done via the covariance function, but it's probably not that easy to do ...
If I'm not mistaken, the sequences are in fact time-series. Maybe it would make sense to exploit the temporal relations, for instance using recurrent neural networks.

There's most definitely more, but those are the things I can think of right now.

Solved – How to increase variance in Gaussian Process regression

If you have different y-values for the point (8,8) then you are supposing that there is noise present. You should model this noise, for instance in the covariance function. Try a covariance like this one:

covfunc = {'covSum',{'covSEiso','covNoise'}};

Best Answer

Related Solutions

Solved – How to correctly use the GPML Matlab code for an actual (non-demo) problem

Solved – How to increase variance in Gaussian Process regression

Related Question