Solved – Log marginal likelihood for Gaussian Process

gaussian processkernel tricklikelihoodmaximum likelihood

Log marginal likelihood for Gaussian Process as per Rasmussen's Gaussian Processes for Machine Learning equation 2.30 is:

$$\log p(y|X) = -\frac{1}{2}y^T(K+\sigma^2_n I)^{-1}y – \frac{1}{2}\log|K+\sigma^2_n I|-\frac{n}{2}\log2\pi$$

Where as Matlab's documentation on Gaussian Process formulates the relation as

$$\log p(y|X, \beta, \theta, \sigma^2) = -\frac{1}{2}\left(y-H\beta\right)^T(K+\sigma^2_n I)^{-1}\left(y-H\beta\right) – \frac{1}{2}\log|K+\sigma^2_n I|-\frac{n}{2}\log2\pi$$

where $H$ is the vector of basis functions and $\beta$ is coefficient vector.

My doubts:

  1. Why there is difference in the two relations?
  2. From my understanding, $H\beta$ is prediction from Gaussian Process; am I right?

Thanks

Best Answer

The more general formulation for the log marginal likelihood (not marginal log likelihood, as you originally wrote - I edited it in your post) of a GP is

$$\log p(y|X) = -\frac{1}{2}(y - m(X))^T(K+\sigma^2_n I)^{-1}(y - m(X)) - \frac{1}{2}\log|K+\sigma^2_n I|-\frac{n}{2}\log2\pi$$

where $m(x): \mathbb{R}^d \rightarrow \mathbb{R}$ for a given point $x$ is a mean function of a GP; and the notation $m(X)$ represents a vector function obtained by applying the mean function to every point in $X$. The GP in GPML (Eq. 2.30) is a zero-mean GP.

In the Matlab version, $H \beta$ stands for a mean function expressed as a linear combination of basis functions $H = H(x)$, it is not the prediction of the GP.

The GP mean prediction will revert to the mean function very far away from points in the training set $X$ (very far in terms of length scale of the kernel), but it is going to be generally different otherwise.

Related Question