I've seen a reasonable amount of literature about how to choose kernels and bandwidths when computing a kernel density estimate, but I am currently interested in how to improve the time it takes to evaluate the resulting KDE at an arbitrary number of points.
In my case, I'm using a multidimensional (2D or 3D) Gaussian kernel with diagonal covariance (i.e. each dimension is independent). The bandwidths in each dimension may differ and are selected using nearest neighbours. However, my question probably extends to different kernels and bandwidth selection methods.
Let's say I have $M$ data points and wish to evaluate the resulting KDE at $N$ grid points. A simple implementation involves evaluating the multivariate normal pdf $MN$ times. For my purposes, $M$ and $N$ are both on the order of thousands, and evaluation has become the bottleneck in my code.
I am not aware of whether there are any generally-accepted improvements to this basic method. I found this paper, which claims to reduce the complexity from $O(MN)$ to $O(M+N)$. However, the method has not been implemented in any 'standard' R or Python libraries that I'm aware of – which suggests it hasn't yet been widely adopted?
Thanks for any pointers you can give.
Best Answer
I'm going to provide an (incomplete) answer here in case it helps anyone else out.