Solved – Gaussian process regression toy problem

gaussian processregression

I was trying to gain some intuition for Gaussian Process regression, so I made a simple 1D toy problem to try out. I took $x_i=\{1,2,3\}$ as the inputs, and $y_i=\{1,4,9\}$ as the responses. ('Inspired' from $y=x^2$)

For the regression I used a standard squared exponential kernel function:

$$k(x_p,x_q)=\sigma_f^2 \exp \left( – \frac{1}{2l^2} \left|x_p-x_q\right|^2 \right)$$

I assumed that there was noise with standard deviation $\sigma_n$, so that the covariance matrix became:

$$K_{pq} = k(x_p,x_q) + \sigma_n^2 \delta_{pq}$$

The hyperparameters $(\sigma_n,l,\sigma_f)$ were estimated by maximizing the log likelihood of the data. To make a prediction at a point $x_\star$, I found the mean and variance respectively by the following

$$\mu_{x_\star} = k_\star^T (\mathbf{K}+\sigma_n^2\mathbf{I})^{-1} y$$
$$\sigma_{x_\star}^2 = k(x_\star,x_\star)-k_\star^T(\mathbf{K}+\sigma_n^2\mathbf{I})^{-1} k_\star$$

where $k_\star$ is the vector of the covariance between $x_\star$ and inputs, and $y$ is a vector of the outputs.

My results for $1<x<3$ are shown below. The blue line is the mean and red lines mark the standard deviation intervals.

The results

I'm not sure if this is right though; my inputs (marked by 'X's) do not lie on the blue line. Most examples I see have the mean intersecting the inputs. Is this a general feature to be expected?

Best Answer

The mean function passing through the datapoints is usually an indication of over-fitting. Optimising the hyper-parameters by maximising the marginal likelihood will tend to favour very simple models unless there is enough data to justify something more complex. As you only have three datapoints, which are more or less in a line with little noise, the model that has been found seems pretty reasonable to me. Essentially the data can either be explained as a linear underlying function with moderate noise, or a moderately non-linear underlying function with little noise. The former is the simpler of the two hypotheses, and is favoured by "Occam's razor".

Related Question