Gaussian Process: confidence interval vs prediction interval vs credible interval

confidence intervalcredible-intervalgaussian processprediction interval

Let a distribution over functions be described by a Gaussian Process (GP) prior, following the notation of Rasmussen and Williams:
$$
f(\mathbf{x})\sim\mathcal{GP}(m(\mathbf{x}), k(\mathbf{x},\mathbf{x}'))
$$

then, considering the mean function $m(\mathbf{x})$ as zero and a set of points $X_*$, we can sample function points from the multivariate normal distribution:
$$\mathcal{N}(\mathbf{0}, K(X_*, X_*))$$
where $K(X_*, X_*)$ is the covariance matrix corresponding to the kernel of choice.

Considering this, how should we refer to the interval $\pm (k\cdot\sigma_{\mathbf{x}})$, where $k$ is a positive constant multiplying the standard deviations (square root of the diagonal of $K(X_*, X_*)$), $\sigma_{\mathbf{x}}$?

Visually, this interval corresponds to the grey area of the following left subfigure (extracted from Rasmussen and Williams):

enter image description here

As can be seen in the caption, authors refer to this interval as "confidence region".

However, I do not understand some points related to this figure and its caption:

  1. Since the predicted function points are scalar values, shouldn't this interval be a "confidence interval" instead [link]?

  2. On the other hand, given that this interval does not correspond to any specific parameter but to the interval where we expect to observe function points, should this interval (for the left subfigure) instead be refered as "prediction interval"?

  3. Is it an approximation to consider just the marginalized $\sigma_{\mathbf{x}}$ when computing this interval, i.e., discarding correlation information?

  4. Finally, once we condition on observed training points to predict the the posterior predictive distribution (right subfigure), shouldn't the corresponding interval also be a prediction interval? For this last one, I have doubts if we should use
    credible interval
    instead, as suggested e.g. by this blog.

Best Answer

1. Since the predicted function points are scalar values, shouldn't this interval be a "confidence interval" instead?

It's a collection of pointwise confidence intervals. A confidence region is just a generalization of a confidence interval, so it's not technically wrong (even if it is a little confusing). Don't stress about this too much.


2. On the other hand, given that this interval does not correspond to any specific parameter but to the interval where we expect to observe function points, should this interval (for the left subfigure) instead be referred as "prediction interval"?

In this particular example, it is both. The confidence intervals are for the mean of the function. Since the function can be realized without noise, it is also a prediction interval.

If the data was generated with noise, then the covariance matrix becomes $\bf K+\tau\bf I$, where $\tau$ is called the "nugget". In this case, there would be a difference between the confidence and predictive intervals. In particular, the confidence interval (for the function mean) will look similar to above but the predictive interval must also account for the nugget effect.


3. Is it an approximation to consider just the marginalized $\sigma_{\mathbf{x}}$ when computing this interval, i.e., discarding correlation information?

I'm not entirely sure I understand this question. The left panel shows what we have before observing any data. We have nothing to condition on and thus no way to use any information about the correlation structure.

On the right, we have datapoints to condition on, and we use the correlation structure (plus the process variance $\sigma_X^2$) to create confidence intervals. This is what leads to the "football" shaped confidence intervals. Uncertainty is highest farthest away from observed data and is zero (for $\tau=0$, at least) at the observed data.


4. Finally, once we condition on observed training points to predict the the posterior predictive distribution (right subfigure), shouldn't the corresponding interval also be a prediction interval? For this last one, I have doubts if we should use credible interval?

I partially addressed this already in the answer to Q2. As for the term credible/confidence interval, I agree that it is confusing. You can call it a credible interval if you like, but both terms are fine in this case.

Many authors like to introduce GPs as a prior/posterior over functions, but in fact there is nothing inherently Bayesian about this analysis. It's based on simple facts of the Gaussian distribution.

To this point, all of this discussion so far is treating the correlation structure as fixed and known. If there are unknown correlation parameters $\phi$, then we can estimate them with both frequentist and Bayesian methods. After we account for uncertainty in the parameters, then the distinction between confidence/credible intervals becomes meaningful (although still largely pedantic, in my opinion).