Solved – Kernel density estimation -Effect of bandwidth

distributionskernel-smoothingmathematical-statisticsprobability

I am trying to learn Kernel density estimation, I need help to understand how the bandwidth $h$ affects the Kernel density estimator. Consider a Gaussian Kernel $k(x)~=~\frac{1}{\sqrt{2 \pi}} e^{-x^2}$. The Kernel density estimator is given by ${\hat{f}}_h (x) ~=~ \frac{1}{n} \sum_{i=1}^{n} K_h(x-X_i)$.

Clearly, $k(x)$ is independent of $h$, where does $h$ come in?
What would be ${\hat{f}}_h (x)$? How does $h$ affect the Kernel?

Thank you!

Best Answer

I guess the problem is that you are using the wrong formula for the Gaussian kernel. The Gaussian kernel uses normal probability density function that has the following form

$$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\tfrac{(x-\mu)^2}{2\sigma^2}} $$

where the distribution with parameters $\mu=0$ and $\sigma^2 = 1$ is called standard normal distribution. The formula you quote resembles it. Gaussian kernel is based on normal density function centered at mean $\mu=0$ and has variance $\sigma^2 = h^2$. So $h$ is the scale parameter (standard deviation) of the kernel, so it serves similar purpose as bandwidth in other kernels, where it controls the "width" of the kernel.

As Dan said, kernels are often defined in terms of "standard" kernel, e.g. standard triangular kernel is $K(x) = (1 - |x|)$ and the scaled kernel $K_h(x) = K(x\,/\,h)\,/\,h$. The standard form of the Gaussian kernel, is the standard normal distribution.

See also the Why does definition of kernel include bandwidth? thread.

Related Question