Solved – If the Epanechnikov kernel is theoretically optimal when doing Kernel Density Estimation, why isn’t it more commonly used

kernel-smoothingnonparametric

I have read (for example, here) that the Epanechnikov kernel is optimal, at least in a theoretical sense, when doing kernel density estimation. If this is true, then why does the Gaussian show up so frequently as the default kernel, or in many cases the only kernel, in density estimation libraries?

Best Answer

The reason why the Epanechnikov kernel isn't universally used for its theoretical optimality may very well be that the Epanechnikov kernel isn't actually theoretically optimal. Tsybakov explicitly criticizes the argument that the Epanechnikov kernel is "theoretically optimal" in pp. 16-19 of Introduction to Nonparametric Estimation (section 1.2.4).

Trying to summarize, under some assumptions on the kernel $K$ and a fixed density $p$ one has that the mean integrated square error is, of the form

$$\frac{1}{nh} \int K^2 (u) du + \frac{h^4}{4}S_K^2 \int (p''(x))^2 dx \,. \tag{1} $$

The main criticism of Tsybakov seems to be minimizing over non-negative kernels, since it's often possible to get better performing estimators, which are even non-negative, without restricting to non-negative kernels.

The first step of the argument for the Epanechnikov kernel begins by minimizing $(1)$ over $h$ and all non-negative kernels (rather than all kernels of a wider class) to get an "optimal" bandwidth for $K$

$$ h^{MISE}(K) = \left( \frac{\int K^2}{nS_K^2 \int (p'')^2} \right)^{1/5}$$

and the "optimal" kernel (Epanechnikov)

$$K^*(u) = \frac{3}{4}(1-u^2)_+ $$

whose mean integrated square error is:

$$h^{MISE}(K^*) = \left( \frac{15}{n \int (p'')^2} \right)^{1/5} \,. $$

These however aren't feasible choices, since they depend on knowledge (via $p''$) of the unknown density $p$ -- therefore they are "oracle" quantities.

A proposition given by Tsybakov implies that the asymptotic MISE for the Epanechnikov oracle is:

$$\lim_{n \to \infty} n^{4/5} \mathbb{E}_p \int (p_n^E (x) - p(x))^2 dx = \frac{3^{4/5}}{5^{1/5}4} \left( \int (p''(x))^2 dx \right)^{1/5} \,. \tag{2} $$

Tsybakov says (2) is often claimed to be the best achievable MISE, but then shows that one can use kernels of order 2 (for which $S_K =0$) to construct kernel estimators, for every $\varepsilon >0$, such that

$$ \limsup_{n \to \infty} n^{4/5} \mathbb{E}_p \int (\hat{p}_n (x) - p(x))^2 dx \le \varepsilon \,. $$

Even though $\hat{p}_n$ isn't necessarily non-negative, one still has the same result for the positive part estimator, $p_n^+ := \max(0, \hat{p}_n)$ (which is guaranteed to be non-negative even if $K$ isn't):

$$ \limsup_{n \to \infty} n^{4/5} \mathbb{E}_p \int (p_n^+ (x) - p(x))^2 dx \le \varepsilon \,. $$

Therefore, for $\varepsilon$ small enough, there exist true estimators which have smaller asymptotic MISE than the Epanechnikov oracle, even using the same assumptions on the unknown density $p$.

In particular, one has as a result that the infimum of the asymptotic MISE for a fixed $p$ over all kernel estimators (or positive parts of kernel estimators) is $0$. So the Epanechnikov oracle is not even close to being optimal, even when compared to true estimators.

The reason why people advanced the argument for the Epanechnikov oracle in the first place is that one often argues that the kernel itself should be non-negative because the density itself is non-negative. But as Tsybakov points out, one doesn't have to assume that the kernel is non-negative in order to get non-negative density estimators, and by allowing other kernels one can non-negative density estimators which (1) aren't oracles and (2) perform arbitrarily better than the Epanechnikov oracle for a fixed $p$. Tsybakov uses this discrepancy to argue that it doesn't make sense to argue for optimality in terms of a fixed $p$, but only for optimality properties which are uniform over a class of densities. He also points out that the argument still works when using the MSE instead of MISE.

EDIT: See also Corollary 1.1. on p.25, where the Epanechnikov kernel is shown to be inadmissible based on another criterion. Tsybakov really seems not to like the Epanechnikov kernel.