Solved – Kernel density estimation incorporating uncertainties

kdekernel-smoothinguncertainty

When visualising one-dimensional data it's common to use the Kernel Density Estimation technique to account for improperly chosen bin widths.

When my one-dimensional dataset has measurement uncertainties, is there a standard way to incorporate this information?

For example (and forgive me if my understanding is naïve) KDE convolves a Gaussian profile with the delta functions of the observations. This Gaussian kernel is shared between each location, but the Gaussian $\sigma$ parameter could be varied to match the measurement uncertainties. Is there a standard way of performing this? I am hoping to reflect uncertain values with wide kernels.

I've implemented this simply in Python, but I do not know of a standard method or function to perform this. Are there any problems in this technique? I do note that it gives some strange looking graphs! For example

KDE comparison

In this case the low values have larger uncertainties so tend to provide wide flat kernels, whereas the KDE over-weights the low (and uncertain) values.

Best Answer

It makes sense to vary the widths, but not necessarily to match the kernel width to the uncertainty.

Consider the purpose of the bandwidth when dealing with random variables for which the observations have essentially no uncertainty (i.e. where you can observe them close enough to exactly) - even so, the kde won't use zero bandwidth, because the bandwidth relates to the variability in the distribution, rather than the uncertainty in the observation (i.e. 'between-observation' variation, not 'within-observation' uncertainty).

What you have is essentially additional source of variation (over the 'no observation-uncertainty' case) that's different for every observation.

So as a first step, I'd say "what's the smallest bandwidth I'd use if the data had 0 uncertainty?" and then make a new bandwidth which is the square root of the sum of the squares of that bandwidth and the $\sigma_i$ you'd have used for the observation uncertainty.

An alternative way to look at the problem would be to treat each observation as a little kernel (as you did, which will represent where the observation might have been), but convolve the usual (kde-) kernel (usually fixed-width, but doesn't have to be) with the observation-uncertainty kernel and then do a combined density estimate. (I believe that's actually the same outcome as what I suggested above.)

Related Question