MATLAB: SVM: How is the classification error with leave-one-out cross validation calculated

I am trying to understand what matlab's leave-one-out cross validation of an SVM is doing by comparing it to a leave-one-out cross validation written myself. Unfortunately, I do not get the same results.

First I create some random data

rng(0);
X = rand(10,20);
Y = [ones(5,1); zeros(5,1)];
n_samples = size(X,1);

Then I calculate the classification error with leave-one-out cross validation

CVSVMModel = fitcsvm(X, Y,...
    'KernelFunction','rbf',...
    'BoxConstraint',1,...
    'LeaveOut', 'on',...
    'CacheSize', 'maximal');
error1 = kfoldLoss(CVSVMModel, 'lossfun', 'classiferror')

Now I try to do the same by hand. For each iteration, one sample is taken out of the training set and predicted afterwards.

error2 = 0;
for fold = 1:n_samples
    idx = [1:(fold-1), (fold+1):n_samples];
    SVMModel = fitcsvm(X(idx,:), Y(idx),...
        'KernelFunction','rbf',...
        'BoxConstraint',1,...
        'CacheSize', 'maximal');
    label = predict(SVMModel, X(fold,:));
    error2 = error2 + (label~=Y(fold));
end      
error2 = error2/n_samples

However, I do not get the same results:

error1 =
      0.7000
error2 =
       1

Can anyone tell me why?

What also worries me: Why does the second method perfectly misclassify every point (error2=1)? This can't be a coincidence.

Best Answer

fitcsvm passes class prior probabilities found from the entire data into each fold. Look at CVSVMModel.Trained{1}.Prior, CVSVMModel.Trained{2}.Prior etc - every time you should see [0.5 0.5]. When you cross-validate yourself, the priors are derived for each fold independently, and in each case you should have 5/9 for one class and 4/9 for the other. This explains the difference.

As to why the 2nd method errors 100% of the time, see my answer here. The short answer is - because the left-out label is always opposite to the majority class in the training set.

Best Answer

Related Solutions

MATLAB: Uniform class probabilities vs. Empirical class probabilities

MATLAB: Crossvalidation of liinear models

Related Question