MATLAB: Do I get a spiky distribution using KSDENSITY with “support” property set to “positive” in Statistics Toolbox 5.1 (R14SP3)

Statistics and Machine Learning Toolbox

See the attached files:
1. In line 21 (g) KSDENSITY is called with property "support" set to "positive"
g = ksdensity(X,Xi,'kernel','normal','support','positive','width',h);
and the values are from 0 to 140 for the output.
2. If I call it without property set to positive
f = ksdensity(X,Xi,'kernel','normal','width',h);
I receive values from 9.59e-04 to 34.81, which is the expected output.
3. If I set the range instead
gmod = ksdensity(X,Xi,'kernel','normal','support',[min(X)-1 max(X)+1],'width',h);
the results are also more accurate (hint of edg math engineer), see line 22 (gmod).

Best Answer

The problem is caused by the inappropriate bandwidth when "support" is set to "positive". When "support" is set to "positive", the data is transformed using log, which scaled the data to a much larger range.
Hence, in this example,
max(X)-min(X)=0.0488
whereas
max(log(X))-min(log(X))=20.0059
Therefore, the bandwidth chosen is based on the original data (which is 0.0054) and hence is too small which is why you get a very spiky distribution. Choosing an appropriate bandwidth for your data is not easy. In general, a too-small or a too-large bandwidth will get you a bad kernel density estimation.
KSDENSITY can decide a default bandwidth value which is optimal for estimating normal densities. You may want to try the default bandwidth value when you are not sure what bandwidth to use.