Solved – How to speed up Kernel density estimation

kernel-smoothingmodel-evaluation

Background

A couple of days ago I asked here about: How to reduce number of points for clustering . Instead of reducing number of points, a method "Kernel Density Estimation" (KDE) was suggested to me, it gives right solutions and it is faster then my previous approach.

Question

What I am curious about now is complexity of this method. Maybe I've implemented it in a bad (too naive) way, here is my c++ implementation http://pastebin.com/gtStWjmA (see evalPrivate method). But assuming I have $m$ data points and I want to sample KDE in $n$ points. Then for each evaluation of KDE I have to evaulate kernel function m-times. So my complexity is $O(m^n)$ and that is too much. If my $m=60 000$ and $n = 1000$. Then it tooks ages to sample KDE in order to find its local maximas.

Best Answer

One trick which I used when I implemented KDEs is to limit the effect of kernel to some values.

Suppose you have a sample $x = {x_1, x_2, .. , x_n}$, and some points where you want to estimate the kernel $k = {k_1, k_2, .., k_n}$. Now, without loosing of generality, we can consider $k$ values as being sorted. If not, then you simply sort them and change indexes.

Consider now what a KDE does. Basically, for each $x_i$ you have a probability mass which you want to spread with a symmetric kernel function to the left and to the right. You are interested only to evaluate that probability mass dispersion only onto the values of $k$. Obviously, if you evaluate this kernel function onto all values of $k$, than you will end up with an execution time of $O(|k||x|)$ (not $O(|k|^{|n|})$ as you suggested, consider 20 samples with 20 estimation points is $20^{20}$ = 104 septillion 857 sextillion 600 quintillion operations).

My trick is to not disperse this probability mass to all the points, but only to some of them, the ones which are closer to the given $x_i$ for which the kernel function is evaluated. For some kernel density functions, like uniform, triangular, Epanechnikov this is a natural concept, since for some distance to the left or to the right, the spreaded probability mass equals $0$ so, it does not affect the kernel density estimation at all. For some other kernel functions like Gaussian I established some limits. For example for Gaussian I established the limit as a function of standard deviations. I do not remember the exact value (you can take a look into my code since is publicly available), but outside of a range of some standard deviations to the left of $x_0$ and to the right of it I considered contributed values as being $0$.

Now, the implementation idea is somehow straightforward to follow in practice. I enriched my kernel functions with a minimum value and maximum value (in other words with a distance to the left or to the right of evaluated $x_i$), and using binary search on vector $k$, I find the range of values from $k$ which has to be updated.

Obviously this can produce errors. But, if you have enough points the effect I found is not a problem in practice. And if you implement it well, using range around $x_i$ as a parameter (I did not do it, but I might if I will need that), you will have a parameter which can be use a a compromise between the precision of the evaluation of KDE and speed. Note also that for some kernel functions this parameter has no effect, because the KDE is precise.

Related Question