Solved – Generating a sample from Epanechnikov’s kernel

kernel-smoothingrejection samplingsimulation

So I am really struggling with this problem and could use some help.

Consider the Epanechnikov kernel given by
$$f_e(x)=\frac{3}{4}\left( 1-x^2 \right)$$

According to Devroye and Gyorfi's "Nonparametric Density Estimation: The $L_1$ View", to generate a sample of a distribution having $f_e$ as its density function we can use the following method (see p.236):

  1. Generate iid $U_1$,$U_2$,$U_3$ ~ Uniform(-1,1).
  2. If $\left| U_3\right| \geq \left| U_2\right|$ and $\left| U_3\right| \geq \left| U_1\right|$, deliver $U_2$; otherwise deliver $U_3$.

I have to prove this works. I thought this was related to either the acceptance-rejection method or maybe order statistics. I have spent a fair bit of time trying both approaches but I am stuck. Any pointers will be greatly appreciated.

Best Answer

Consider this alternative description of the same algorithm:

  1. Generate iid $X_1, X_2, X_3$ with Uniform$(0,1)$ distributions.

  2. Select one of the two smallest of the $X_i$ at random, with equal probability. Call this value $X$.

  3. Randomly negate $X$ with probability $1/2$.

Parts (1) and (3) reflect the fact that a Uniform$(-1,1)$ variate is the random negation of a Uniform$(0,1)$ variate. Part (2) is a restatement of step (2) in the question.

It comes down to computing the distribution of $X$. To this end, let $0 \le t \le 1$. The event that $X \le t$ decomposes into two disjoint possibilities:

  1. At least two of the $X_i$ lie in $[0, t]$, guaranteeing that $X$ (which must be among them) will lie in $[0, t]$. This is a binomial probability given by $$\binom{3}{2}t^2(1-t) + \binom{3}{3}t^3 = t^2(3-2t).$$

  2. Exactly one of the $X_i$ lies in $[0, t]$ and it is the one randomly chosen. The random choice multiplies the chance (another binomial probability) by $1/2$, giving $$\binom{3}{1}t(1-t)^2 \times \frac{1}{2} = \frac{3}{2}t(1-t)^2.$$

Therefore the distribution function is

$$F(t) = t^2(3-2t) + \frac{3}{2}t(1-t)^2 = \frac{3}{2} t - \frac{1}{2}t^3,$$

whence the density function is

$$f(t) = F^\prime(t) = \frac{3}{2}(1 - t^2).$$

Figure

The blue shaded area represents $f$ while the red shaded area shows how it is extended symmetrically about $0$ to define a distribution supported on the interval $[-1,1]$ with density $f_e = \frac{1}{2}f$.

Extending $f$ to the domain $[-1,1]$ by symmetry (which is what part (3) does) will not change its functional form (which is already symmetric about $0$, since $(-t)^2 = t^2$) but must halve the height to maintain its normalization, whence

$$f_e(t) = \frac{1}{2}\left(\frac{3}{2}(1 - t^2)\right) = \frac{3}{4}(1-t^2).$$


By the way, a simpler way to generate such a sample is to take the median of three iid Uniform$(0,2)$ variates. Here's R code:

n <- 1e5
x <- apply(matrix(runif(3*n, -1, 1), 3), 2, median)

(It takes two or three seconds to generate 100,000 values.) Comparing the histogram of this sample to the kernel confirms its accuracy:

enter image description here

Related Question