MATLAB: Leave-one-out cross-validation with svmtrain gives ‘impossible’ accuracy results

I am using svmtrain to perform leave-one-out cross-validation on some data that I have access to, and I was noticing that some svm models generated were obtaining 0% accuracy for a binary classification problem involving hundreds of examples.

To perfectly pick the wrong binary choice that many times is essentially impossible, so I figured there was something wrong with my svm implementation. Therefore, I wrote a test program which generates a random feature matrix as training input and a random binary value as training output. Even with this set up, some svm models generated by svmtrain give 0% accuracy when the output is totally random and uncorrelated with the input.

Can anyone explain what I am doing wrong? I have included the test program source below:

%clear workspace
clear;
clc;
pause on;
%seed random
rng('default');
%initialize variables
n_sets=1000;
n_pairs=20;
n_features=2;
%initialize classification accuracy
accuracy=zeros(1,n_sets);
for i=1:n_sets
      fprintf('\nSet #%i\n',i);
      %generate random feature matrix
      training_input=single(rand(n_pairs,n_features));
      %generate random classification matrix
      training_output=single(rand(n_pairs,1)>0.5);
      %initialize correct counter
      correct=0;
      %Perform leave one out cross validation
      for j=1:n_pairs
          %define inputs for SVM model
          model_training_input=training_input;
          model_training_output=training_output;
          %blind training on the jth row of the feature matrix
          model_training_output(j)=NaN;
          %generate SVM model from all of feature matrix other than jth row
          svm_model=svmtrain(model_training_input,model_training_output,'autoscale',false);
          %test model on the jth row
          prediction=svmclassify(svm_model,training_input(j,:));
          %check if prediction was correct        
          if(prediction==training_output(j)), correct=correct+1; end
      end
      accuracy(i)=correct/n_pairs;
      fprintf('Accuracy = %s\n',num2str(accuracy(i)));
      if(accuracy(i)==0||accuracy(i)==1)
          fprintf('WTF\n');
          pause;
      end
  end

Best Answer

You are missing my point about the majority class. Let me try again.

Suppose you generate 200 observations and assign labels at random. By chance you can get a situation in which exactly 100 observations are from one class (say A) and 100 observations are from the other class (say B). This probability is

>> binopdf(100,200,.5)
ans =
    0.0563

substantial. This is your training set.

Now you remove one observation from the training set at a time. When you remove an observation of class A, you have 99 observations of class A and 100 observations of class B left. The SVM model cannot find a good decision boundary (because the classes are inseparable) and predicts everything into the majority class, that is, class B. The predicted label for the removed observation is incorrect.

Now you remove an observation of class B from the same set. Now you have 100 observations of class A and 99 of class B. The majority class is now A. Your SVM model predicts everything into A. Again the predicted label for the removed observation is incorrect.

Therefore the leave-one-out error for this training set (with two class sizes equal) is going to be 100%.

If you use if(prediction==single(rand(1)>0.5)), you get the expected accuracy because you are not comparing with the label of the removed observation.

It's possible something else is going on in your data, but I do not see any evidence of anything else going on in your description.

Best Answer

Related Solutions

MATLAB: SVM: How is the classification error with leave-one-out cross validation calculated

MATLAB: Average and new observation confidence bounds for predicting Non linear models

Related Question