Solved – Covariance vs Bandwidth of Kernel Density Estimate

kernel-smoothing

I've been working with the scipy gaussian kde implementation (here), but I don't quite understand the difference between the bandwidth factor and the covariance matrix. I'm using it for a single dimension here, so the covariance matrix only contains a single value. The bandwidth factor is estimated using either Scott's or Silverman's method, and then apparently it is multiplied with
the covariance matrix; but why? What does this covariance matrix actually represent? I understand the bandwidth is the 'width' of the kernel being used, but I don't see the role of the covariance matrix.

Best Answer

Scott's rule gives a bandwidth factor of $n^{- \frac{1}{d + 4}}$, where $n$ is the number of data points and $d$ their dimension; Silverman's is $(\frac14 n (d + 2))^{-\frac{1}{d + 4}}$.

Neither of these depend on the scale of the dataset. But if you change the scale of your data so that everything is 10 times as far apart, of course you should change the bandwidth of your KDE to be 10 times as large. That's accomplished by the covariance matrix, getting a sense of the scale of the overall data.