Solved – Understanding the predictive distribution in gaussian linear regression

gaussian process

I'm reading through the Gaussian Process book http://www.gaussianprocess.org/gpml/chapters/RW2.pdf and there's one section here I don't quite understand (page 11). The author says:

"the predictive distribution is given by averaging the output of all possible linear models wrt the Gaussian posterior"

$$
\begin{aligned}
p(f_*|x_*,X,y) &= \int p(f_*|x_*,w)~p(w|X,y)~dw \\
&=\mathcal N\left(\frac{1}{\sigma_n^2}x^T_*A^{-1}Xy,~x^T_*A^{-1}x_*\right)
\end{aligned}
$$

What does this mean, exactly? I understand that the purpose of using Gaussians is to be able to calculate uncertainty for a prediction, but I'm unclear how the "averaging of the output" doesn't end up with just a mean value of the weight. And how were the parameters for the mean and covariance derived?

Best Answer

The Posterior predictive distribution is a weighted average over your hypothesis space where each hypothesis is weighted by it's posterior probability. In Bayesian analysis, beliefs are expressed as entire distributions rather than point estimates. In your example, you have a posterior distribution over all possible weights. The fully Bayesian way to make a prediction is to marginalize out the weights by integrating over your posterior. There are alternatives to this such as taking the MAP value, which is the most probable weight value under your posterior, however that is not strictly Bayesian. It might help to review a smaller, more introductory Bayesian inference problem (e.g. inferring the mean of a Gaussian with known variance) to get your head around the concept, because it is rather fundamental to the entire approach.

Related Solutions

Solved – How to increase variance in Gaussian Process regression

If you have different y-values for the point (8,8) then you are supposing that there is noise present. You should model this noise, for instance in the covariance function. Try a covariance like this one:

covfunc = {'covSum',{'covSEiso','covNoise'}};

Solved – About the “a prior over the parameters” why always Gaussian distribution

I can't exactly define what you ask in this question. However, I have two hypothesis.

Why do we use Gaussian prior for Bayesian linear regression?

We use this prior as convenient one and one that has nice interpretation. Really, it is a quadratic penalty for parameters values.

Why do we use Gaussian process as a model for the data?

Realizations of Gaussian processes with a proper covariance function can provide nearly all functions we can encounter in "real life". Also, they are convenient and provide exact inference and marginal distribution.

Best Answer

Related Solutions

Solved – How to increase variance in Gaussian Process regression

Solved – About the “a prior over the parameters” why always Gaussian distribution

Related Question