MATLAB: How to use KSDENSITY for least-square cross-validation in Statistics Toolbox 5.3 (R2006b)

Statistics and Machine Learning Toolbox

KSDENSITY allows me to estimate the probability density function however it does not allow me to automatically optimize the bandwidth of the kernal. In fact, the algorithm found in KSDENSITY is optimal for "normal" probability density functions, and probably does not take into account the weights (Reference to section 2.4.2 in documentation [1], Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press, 1997).
I would like to be able to use a least-square cross-validation algorithm in KSDENSITY that holds for a general probability density function and gaussian (normal) kernel. This algorithm would also take into account any weights.
A third party toolbox that does something similar is at,

Best Answer

KSDENSITY has been used internally with cross-validation both for testing and to illustrate one of the applications of the CROSSVAL function. Below you will find a script that demonstrates this. It is used to optimize the log likelihood, not a sum of squares.
In this example script, weights are not used but that would be easy to incorporate by passing the weights into CROSSVAL along with the function whose parameter is being estimated. You could then compute the weighted sum of log likelihoods in place of the unweighted sum.
If you conjecture that the weights are not used in choosing the default width, that may be true. Weights may represent frequencies or may sometimes represent something exogenous which may not be appropriate to include in the default width selection.
load carsmall Weight
% histogram plus density with default bandwidth
a1 = subplot(2,1,1);
[ff,xx] = ecdf(Weight);
ecdfhist(ff,xx);
a = findobj(gca,'type','patch');
set(a(end),'facecolor',[.9 .9 1])
[f0,x0,u] = ksdensity(Weight);
line(x0,f0,'color','b')
% try other bandwidths
uu = linspace(u/6,1.2*u,11);
subplot(2,2,3);
v = zeros(size(uu));
% using same partition each time reduces variation
cp = cvpartition(length(Weight),'kfold',10);
for j=1:length(uu)
% compute log likelihood for test data based on training data
loglik = @(xtr,xte) sum(log(ksdensity(xtr,xte,'width',uu(j))));
% sum across all train/test partitions
v(j) = sum(crossval(loglik,Weight,'partition',cp));
% plot the fit to the full dataset
[f,xi] = ksdensity(Weight,'width',uu(j));
h(j) = line(xi,f,'color',[.75 .75 .75]);
end
% find and highlight the one that appears best
[maxv,maxi] = max(v);
set(h(maxi),'linewidth',2,'color','r');
h0 = line(x0,f0,'color','b','linewidth',2);
title('Kernel smooth variation with bandwidth')
legend([h0 h(maxi)],'Default','Highest log-lik')
% show the cross-validation values (sum of log likelihoods)
subplot(2,2,4)
plot(uu,v,'b-',uu(maxi),v(maxi),'ro')
title('Cross-validated log likelihood vs. bandwidth')
% add it to the histogram display
[f,x] = ksdensity(Weight,'width',uu(maxi));
line(x,f,'color','r','parent',a1)
legend(a1,'Histogram','Default','Cross-validated')
title(a1,'Histogram and two kernel-smooth estimates')