MATLAB: SVM: How is the classification error with leave-one-out cross validation calculated

classificationclassification errorcross-validationleave-one-outsvm

I am trying to understand what matlab's leave-one-out cross validation of an SVM is doing by comparing it to a leave-one-out cross validation written myself. Unfortunately, I do not get the same results.
First I create some random data
rng(0);
X = rand(10,20);
Y = [ones(5,1); zeros(5,1)];
n_samples = size(X,1);
Then I calculate the classification error with leave-one-out cross validation
CVSVMModel = fitcsvm(X, Y,...
'KernelFunction','rbf',...
'BoxConstraint',1,...
'LeaveOut', 'on',...
'CacheSize', 'maximal');
error1 = kfoldLoss(CVSVMModel, 'lossfun', 'classiferror')
Now I try to do the same by hand. For each iteration, one sample is taken out of the training set and predicted afterwards.
error2 = 0;
for fold = 1:n_samples
idx = [1:(fold-1), (fold+1):n_samples];
SVMModel = fitcsvm(X(idx,:), Y(idx),...
'KernelFunction','rbf',...
'BoxConstraint',1,...
'CacheSize', 'maximal');
label = predict(SVMModel, X(fold,:));
error2 = error2 + (label~=Y(fold));
end
error2 = error2/n_samples
However, I do not get the same results:
error1 =
0.7000
error2 =
1
Can anyone tell me why?
What also worries me: Why does the second method perfectly misclassify every point (error2=1)? This can't be a coincidence.

Best Answer

fitcsvm passes class prior probabilities found from the entire data into each fold. Look at CVSVMModel.Trained{1}.Prior, CVSVMModel.Trained{2}.Prior etc - every time you should see [0.5 0.5]. When you cross-validate yourself, the priors are derived for each fold independently, and in each case you should have 5/9 for one class and 4/9 for the other. This explains the difference.
As to why the 2nd method errors 100% of the time, see my answer here. The short answer is - because the left-out label is always opposite to the majority class in the training set.
Related Question