Solved – MCMC and approximate inference for Gaussian processes

bayesiangaussian processinferencemarkov-chain-montecarlo

I am reading a paper that is doing approximate inference over GP models. The idea is that when the likelihood is non-Gaussian than the inference of the posterior is not tractable and we need to use approximate inference schemes to compute the posterior distribution.

There is one bit in the paper that I do not follow. The author says:

In order to benefit from the advantages of MCMC it is necessary to
develop efficient sampling strategies. This has proved to be particularly
difficult in many GP applications that involve the estimation of a smooth
latent function. Given that the latent function is represented by a
discrete set of values, the posterior distribution over these function
values can be highly correlated. The more discrete values used to
represent the function, the worse the problem of high correlation becomes.

Perhaps this is very naive but why does the discreteness of the function increase the correlation?

Best Answer

The key word here is "smooth". Let's take a simple case where the values are on $[0,1]$ and we want to estimate a random function $f(x)$ with $x \in [0,1]$. If $f$ is differentiable, then the covariance function $k(x_t,x_s)$ will be differentiable in each argument. This is true whether the process is Gaussian or not, just as long as it has a finite covariance function.

When I sample more values on the interval, I get values that are closer together and this means that $f(x_{t+\epsilon})$ is almost a linear function of $f(x_t)$, and the covariance function at $x_{t+\epsilon}$ is almost a linear function of the covariance function at $x_t$. This means that the discrete covariance matrix of the sampled values will be almost singular.

All of this applies to Gaussian processes just as much as non-Gaussian processes.

Perhaps another way to look at this would be to observe that when you have a smooth function, sampling from the $x$ values has diminishing returns beyond a certain point. You get more values, but you don't get more information, since you are already well placed to estimate $f(x)$. This is the downside to all those neat convergence theorems that we learn in second year Calculus.

Related Question