Practical Uses of Kernel Density Estimators – Density Estimation Techniques

density-estimationkernel-smoothing

Perhaps this question is too broad, but I would like to know – how does one use a kernel density estimate in practice? I know of course that one can use it to draw pretty pictures on top of histograms, but what in particular can a KDE be used for that the empirical distribution itself cannot be used for, especially with the whole notion of needing to properly determine the correct bandwith?

Best Answer

As noticed by others, empirical distribution is not smooth, it's a step function. Say that you observed three datapoints: 2.56, 4.17, and 4.89 and I ask you "what would be the probability density of observing the value 3.14?". Given empirical distribution, there is no nice answer, you need to make some rather arbitrary decision on how to proceed. It's not really that empirical distribution "cannot be used" in such cases, but that it would serve as a very rough approximation of the underlying distribution.

There are many uses of kernel densities, for example:

  • Plotting is an obvious one.
  • Estimating mode of a continuous distribution, as mentioned by @BruceET in the comment.
  • Kernel regression, a non-parametric regression model.
  • Naive Bayes algorithm can use them to approximate the distributions of continuous variables.
  • Kernel discriminant analysis is another classification algorithm using KDE.
  • In cases involving data-based optimization or sampling, you often want not to be blind about the regions of the distribution where the data was not observed. In such cases, kernel density is a nice and simple way to "interpolate" those regions.

etc.

Yes, choosing the bandwidth is somehow arbitrary, but it often does not hurt us that much to pick imperfect bandwidth and there are already many available algorithms for picking it. Notice however that when using other methods (empirical distribution, $k$NN) you run into other problems, like deciding on how to interpolate and extrapolate from the estimated distributions, or picking other kinds of hyperparameters.

Related Question