Solved – Optimal grid size for kernel-density estimation

kernel-smoothingr

I am generating 2D kernel density distributions for every pair of numeric columns in a data set, using kde2d function in the MASS package in R.

This takes the following parameters:

kde2d(x, y, h, n=25, lims = c(range(x), range(y)))

where n is the "Number of grid points in each direction. Can be scalar or a length-2 integer vector".

I want to optimize the dimensions of the grid for every pair of columns. At the moment, I used a fixed dimensions of 10×10. Does anyone know a formula for optimizing the grid size so I can generate optimal density estimations for each pair of columns?

Thanks

Best Answer

As described by Venables and Ripley (2002), grid is about the number of points that kernel density is estimated on:

We apply two-dimensional kernel analysis directly; this is most straightforward for the normal kernel aligned with axes, that is, with variance $\operatorname{diag}(h^2_x;h^2_y)$. Then the kernel estimate is

$$ f(x, y) = \frac{\sum_s \phi((x-x_s)/h_x) \phi((y-y_s)/h_y)}{nh_x h_y} $$

which can be evaluated on a grid as $XY^T$ where $X_{is} = \phi((gx_i-x_s)/h_x)$ and ($gx_i$) are the grid points, and similarly for $Y$.

So there is nothing to optimize in here -- simply if you take more points, you'd get more precise estimates. More gridpoints means also that your computation might get slower.


Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Related Question