Solved – Using Gaussian process regression with non Gaussian data

bayesiangaussian processpredictionregression

I have a question about practiacal implementation and interpretation of the Gaussian process regression model given by Rasmussen & Williams (http://www.gaussianprocess.org/gpml).

The regression problem is defined as follows:

Let $x_i∈R^{5}$ be an input vector and $y_i∈R$ be its corresponding target. The set of $M$ inputs are arranged into a matrix $X=[x_1,…,x_M]^⊤$ and their corresponding targets are stored in a matrix $Y=[y_1,…,y_M]^⊤$

I wish to train a GPR model $G={X,Y,θ}$ using the squared exponential function:

$k(x_i,x_j)=α^2exp(\frac{−1}{2β^2(xi−xj)^2})+γ^2δ_{ij}$,

where $δ_{ij}$ equals 1 if $i=j$ and 0 otherwise. The hyperparameters are $θ=(α,β,γ)$ with γγ being the assumed noise level in the training data and $β$ is the length-scale.

To train the model, I need to minimise the negative log marginal likelihood with respect to the hyperparameters:

$−logp(Y∣X,θ)=\frac{1}{2}tr(Y^⊤K^{−1}Y)+\frac{1}{2}log∣K∣+c$,
where c is a constant and the matrix K is a function of the hyperparameters

covfunc = @covSEiso;
likfunc = @likGauss;
sn = 0.1;
hyp.lik = log(sn);
hyp2.cov = [0;0];
hyp2.lik = log(0.1);
hyp2 = minimize(hyp2, @gp, -100, @infExact, [], covfunc, likfunc, X1, Y1(:,n));
exp(hyp2.lik)
nlml2 = gp(hyp2, @infExact, [], covfunc, likfunc, X1, Y1(:, n));
[m s2] = gp(hyp2, @infExact, [], covfunc, likfunc, X1, Y1(:, n), XT);
YT(:, n) = m;

X1,Y1 are the training inputs/targets

XT are the test inputs, YT are the target predictions.

My question is about the use of [m,s2] as predictions for Y (the second to last line of code). If the original sample Y are not normally distributed is the predictive distribution given by the values in [m,s2] stil valid? For example is it valid to say that for a test value $x_{*i}$ the distribution of the prediction for the target $y_{*i}$ is $N(m,s2)$, or am I misinterpreting the implementation of this model?

Best Answer

For a gaussian process, the evaluation of the gaussian processes of an input value $(x+1)$ is the output $y(x+1) $ obtained by

$$\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\frown$}} \over y} = {K^y}(\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\frown$}} \over x} ,x)*{\left( {{K^Y}} \right)^{ - 1}}*y $$

in your code, this is expressed by $m$ ,the mean value is the value of higher probability. For each value that you want to evaluate, you need to calculate $${K^y}(\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\frown$}} \over x} ,x) $$ and then you apply the equation for the mean value to obtain your output value.

You can see a gaussian process as a sum of gaussian distributions over a specific path. Gaussian Processes Plot