Solved – Understanding the predictive distribution in gaussian linear regression

gaussian process

I'm reading through the Gaussian Process book http://www.gaussianprocess.org/gpml/chapters/RW2.pdf and there's one section here I don't quite understand (page 11). The author says:

"the predictive distribution is given by averaging the output of all possible linear models wrt the Gaussian posterior"

$$
\begin{aligned}
p(f_*|x_*,X,y) &= \int p(f_*|x_*,w)~p(w|X,y)~dw \\
&=\mathcal N\left(\frac{1}{\sigma_n^2}x^T_*A^{-1}Xy,~x^T_*A^{-1}x_*\right)
\end{aligned}
$$

What does this mean, exactly? I understand that the purpose of using Gaussians is to be able to calculate uncertainty for a prediction, but I'm unclear how the "averaging of the output" doesn't end up with just a mean value of the weight. And how were the parameters for the mean and covariance derived?

Best Answer

The Posterior predictive distribution is a weighted average over your hypothesis space where each hypothesis is weighted by it's posterior probability. In Bayesian analysis, beliefs are expressed as entire distributions rather than point estimates. In your example, you have a posterior distribution over all possible weights. The fully Bayesian way to make a prediction is to marginalize out the weights by integrating over your posterior. There are alternatives to this such as taking the MAP value, which is the most probable weight value under your posterior, however that is not strictly Bayesian. It might help to review a smaller, more introductory Bayesian inference problem (e.g. inferring the mean of a Gaussian with known variance) to get your head around the concept, because it is rather fundamental to the entire approach.

Related Question