Solved – why signal variance is big for optimized gaussian process regression with gaussian rbf kernel

gaussian processvariance

How to interpret big signal variance of optimized gaussian rbf kernel in gaussian process regression model? How to interpret big signal variance in general for this kernel? The function that is optimized is log-marginal likelihood:
$$
log p(y|X) = {-\frac{1}{2}y^T(K+\sigma^2_n\mathcal{I})^{-1}y} {- \frac{1}{2}log|K + \sigma^2_n\mathcal{I}|} – \frac{n}{2}log2\pi
$$

and the kernel function:

$$
k_{SE}(x_p, x_q) = \alpha^2 exp\big(-\frac{1}{2}(x_p – x_q)^T \Lambda^{-1} (x_p – x_q)\big)
$$
where $\Lambda$ is a diagonal 'lengthscales' matrix.

example using GPflow

h(x) = a*x

[edit 1]:
example using GPflow with different starting values for hyperparameters
Here I just plot predictions of models. Models had set starting values of parameters according to labels in the image. 0 value means I did not set hyperparameters (should be default=1).
different starting values

Is this because of points being very linear, so there all $ x_i $ from the dataset has the same "impact" on predicted $x_*$ (big lengthscales)? If so, how can one interpret big value of gaussian kernel variance?

Best Answer

In the first, example, it's picking a very long lengthscale and high variance, which is strange given that there's no noise in your data. You might be falling into a local minimum. Are you able to try different starting points for the hyperparameter optimization?

Related Question