Kriging and GPR – Difference Between Non-Zero Nugget and Noise Term in Kriging/GPR

gaussian processkrigingnoisenuggetuncertainty

With some Gaussian Process Regression/Kriging models, it's possible to specify both a non-zero nugget, and a noise term. For example, in Scikit-learn's GPR model, there is an alpha parameter, which I think represents the nugget, and a WhiteKernel that represents noise and can be added to any other kernel.

These two components have very similar effects on the results, as far as I can see (although counter-examples could be very instructive here).

I'm wondering what the two represent. I think (after some discussion on chat) that the nugget basically represents low-distance spatial variability (e.g. variability on scales greater than zero, but smaller than the smallest distance in the dataset), where a noise term would represent uncertainty in the sampled values of each data point (so basically measurement error). Is this a correct interpretation? Can the noise term also represent other things?

Best Answer

Random noise and nugget effect are indeed quite similar to some extent. The difference between the two appears

  1. when there are repeated observations (i.e., several observations at the same location), and
  2. when you compute the predicted value at an observation point.

The random noise model assumes that observations are corrupted by additive, IID Gaussian noise. Practically, this means that repeated observations at a single location are producing differents outcomes. The posterior mean of the GP is not equal in this case to the observed value (even if there is only one observation at a particular location). This is GP regression, with a (usually) smooth regression function.

The nugget model, on the other hand, assumes a deterministic observations model (repeated observations should provide to the same value) but a very rough underlying function. The posterior mean of the GP is equal in this case to the observed value at each observation point, but is discontinuous at these points. This is in fact a form of GP interpolation, with a discontinuous interpolant.

Remark: in the first case (random noise), the individual values of the repeated observations do not matter. The posterior distribution of the GP depends only on the number of observations at each location and on their average.