Solved – Hyperparameters Optimisation in Gaussian process for regression

gaussian processhyperparameter

I am trying to perform Gaussian Process for regression. I chose the SE Kernel : $K(x_i,x_j) = \exp(-\frac{||x_i-x_j||^2}{l}) + \sigma_n\delta_{i,j}$. I begin by maximize the log-likelihood with respect to the latent variables, with conservative values for hyperparameters.

Then I maximize the log-likelihood with respect to the hyperparameters and the results seems weird : the parameter $\sigma_n$ does not stop increasing. I don't get why the log-likelihood grows always with $\sigma_n$ while the model obviously goes wrong.

I tried also to maximize the log-likelihood only with respect to the length-scale parameter $l$. In this case, the log-likelihood is the highest when $l$ equal the value used to perform the first optimization with respect to the hyperparameters.

I both perform the optimization of the log-likelihood with respect to the latent variables by maximizing

$-\frac{dn}{2} – \frac{d}{2}\ln(|K|) – \frac{1}{2}\text{tr}(K^{-1}YY^T)$

where $Y$ is the $n \times d$ matrix of observed data in the function-space. Is there something I poorly undesrtood ?

Best Answer

If you check out the GPML book there is a natural trade off between model fit and complexity as expressed in the determinant (complexity of model) and the trace term above (data fit). As the noise term gets big the data fit is going to look really good but the complexity is increasingly worse. My guess is that you are getting trapped in a local minima. You could try optimisation with multiple restarts or bound the space of optimisation. You can also put a prior over the hyperparameter space.

Also the log marginal likelihood seen in texts is written as,

$-\frac{1}{2}y^tK^{-1}y - \frac{1}{2}log(|K|) - \frac{d}{2}log(2\pi)$