MATLAB: KERNEL : mean integrated squared error- Bandwidth Selection

bandwidthkernel

Hello all,
I have my set of data and I estimated the function using kernel, however the Bandwidth must be estimated for a correct density from the given data. I just put 0.2 for initial start so I will be able to play around with the bandwidth before looking into proper method but the kernel didn't work for width = 0.2,however for another set of data it did work. there is more proffesional method to pick the best bandwith for the given data and it is using mean integrated squared error, Is there any in-built function in Matlab, I didn't seem to find any, not sure if there is a method in one of the toolboxes not available to me. I would like to know why the width 0.2 is not working to my code??..
Thank you all,
sample1 = [6.52689332414481E7
6.52693837402845E7
6.5270203713004336E7
6.527122138667133E7
6.52717237415096E7
6.527173346449997E7
6.527211590239384E7
6.5272540473269284E7
6.527282568117965E7
6.527314005807114E7
];
x = sample1.';
[xi,f]=ksdensity(x,'width',0.2);
plot(f,xi);
line(repmat(x,2,1),repmat([0;0.1*max(xi)],1,length(x)),'color','g' );

Best Answer

The "right" width depends on your assumptions about the fitted distribution. MATLAB does not choose the bandwidth "randomly". It computes the optimal bandwidth for the normal distribution:
help ksdensity
[snip]
[F,XI,U]=ksdensity(...) also returns the bandwidth of the kernel smoothing window.
[snip]
'width' The bandwidth of the kernel smoothing window. The default is optimal for estimating normal densities, but you may want to choose a smaller value to reveal features such as multiple modes.
If you look at that Wikipedia article, note this paragraph:
Neither the AMISE nor the hAMISE formulas are able to be used directly since they involve the unknown density function ƒ or its second derivative ƒ'', so a variety of automatic, data-based methods have been developed for selecting the bandwidth. Many review studies have been carried out to compare their efficacities,[6][7][8][9][10] with the general consensus that the plug-in selectors[11] and cross validation selectors[12][13][14] are the most useful over a wide range of data sets.
I suggest that you choose the optimal bandwidth by cross-validation using ksdensity and crossval functions. Often the approximation based on the normal distribution (which you get by default from ksdensity) is good enough. -Ilya