I have one dimensional data. All data points are larger than 0. The median and mean are about 10 and 25 respectively. The distribution appears to be lognorm but with really high frequency around the median and fat tail, so lognorm does not fit well. Then I am thinking to use Kernel Density Estimation to describe the data. I tried different ways to find the best bandwidth. (Reference: https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/)
R (reference rules)
bw.SJ(data)
bw.nrd(data)
bw.nrd0(data)
bw.ucv(data)
All results are too small (smaller than 0.2) and the graph shows too many bumps, which makes it difficult to analyze.
Python sklearn (cross validation)
grid = GridSearchCV(KernelDensity(), {'bandwidth': np.linspace(0.1, 1.0, 30)}, cv=20)
I ended up testing values of bandwidths from 0.01 to 50 and the best one was 20. Since 20 is too large, the graph is almost flat and does not fit the data at all.
Do you have any ideas why these methods do not work well with my data? Could you tell me other methods to find better bandwidths?
Best Answer
logcondens
andLogConcDEAD
.