I am generating 2D kernel density distributions for every pair of numeric columns in a data set, using kde2d
function in the MASS
package in R.
This takes the following parameters:
kde2d(x, y, h, n=25, lims = c(range(x), range(y)))
where n
is the "Number of grid points in each direction. Can be scalar or a length-2 integer vector".
I want to optimize the dimensions of the grid for every pair of columns. At the moment, I used a fixed dimensions of 10×10. Does anyone know a formula for optimizing the grid size so I can generate optimal density estimations for each pair of columns?
Thanks
Best Answer
As described by Venables and Ripley (2002), grid is about the number of points that kernel density is estimated on:
So there is nothing to optimize in here -- simply if you take more points, you'd get more precise estimates. More gridpoints means also that your computation might get slower.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.