Solved – derivation of predictive distribution of Gaussian Process

There is a duplicate, and the reason why I still ask this question is that, the answer to that duplicate doesn't answer the question well.

The Gaussian Process prior is
$$u\sim GP(0,k(x,x'))$$
I tend to write it this way,
$$p(u)=N(0,K)$$

The observed training data set is $S=\{(x_1,y_1),…,(x_n,y_n)\}$, and
$$y=u(x)+\epsilon$$
and $p(\epsilon)=N(0,\sigma^2I)$. So the likelihood of $u$ given observed $S$ is,
$$p(y|x,u)=N(u,\sigma^2I)$$
now let derive the posterior of $u$,

\begin{align}p(u|x,y)&=\frac{p(y|x,u)p(u)}{p(y|x)}\\
&=N(K(\sigma^2I+K)^{-1}y, \sigma^2(\sigma^2I+K)^{-1}K)\end{align}

and this posterior agrees with the equation $(5)$ in that duplicate post.

Now, here comes my problem, I try to derive the predictive distribution. Let $(x^*,u^*)$ denote the unseen data, and since we assume the observed data and the unseen data have a joint Gaussian Process prior, that is,

$$p\pmatrix{u\\u^*}=N\big(0,\pmatrix{K_x &K_{xx^*}\\K_{x^*x} &K_{x^*}}\big)$$

so I could compute the $p(u^*|u)$ by conditioning on $u$. And finally the predictive distribution is

$$p(u^*|S)=\int p(u^*|u)p(u|S)du$$

I have to say I couldn't compute this integral, but the result given out by the duplicate post is

enter image description here ,

Question

1) I don't know how this result is computed, could you please help me to get it straight?

2) I observe that, this result is actually the conditional distribution from the joint distribution of $(y,u^*)$, that is

$$p\pmatrix{y\\u^*}=N(0,\pmatrix{K_x+\sigma^2I &K_{xx^*}\\K_{x^*x} &K_{x^*}})$$

by conditioning on $y$, I could get the same result $p(u^*|y)$ as the above one. Is it a coincidence?

Best Answer

See for example Murphyin page 110-111 Chapter 4.3 Inference in jointly Gaussian distributions. What you are looking for in Theorem 4.3.1, which uses the Matrix Inversion Lemma to compute the posterior conditional probabilities. Replacing $x_1$ with $u_{*}$ and $x_2$ with $u$, $\Sigma_{11}$ with $K_{x^*}$, $\Sigma_{22}$ with $K_{x}$, etc. you get the desired result. (Since we condition w.r.t. $u$-$x_2$) Again the proof is straightforward if you use the matrix inversion Lemma.

Best Answer

Related Solutions

Solved – Confusion related to predictive distribution of gaussian processes

Solved – Trying to understand Gaussian Process

Related Question