Solved – the criterion value on sequential feature selection for binary classification

I have a set of data represented by 16 features and a binary classification (true, false). I want to determine which features are important using forward and backward sequential feature selection, i.e. can the results improve by leaving out features (backwards) or by adding features (forward). For validation I use a 10-fold cross validation.

The Matlab help page on SequentialFS states that I should use the misclassification rate for that.

It also states:

sequentialfs divides the sum of the values returned by fun across all test sets by the total number of test observations. Accordingly, fun should not divide its output value by the number of test observations.

Therefore i have a function (for Linear Discriminant analysis as an example) as follows:

function misclass = PartLDA(XT,yT,Xt,yt)
    lda = fitcdiscr(XT,yT);          %poerform LDA
    ldaClass = resubPredict(lda);    %prediction
    misclass=sum(ldaClass ~= yT);    %calculate missclassification rate
end

I call the function form a different function as follows:

X = SampleList;       %this are the 16 features
y = SampleClass;      %and the classification (true / false)

c = cvpartition(y,'k',10);  %create 10-fold cross validation
opts = statset('display','iter'); %show iterations

direction='forward' %or 'backward'

[fs,history] = sequentialfs(@PartLDA,X,y,'cv',c,'options',opts,'direction',direction)

As an example of running the code, i get the following output

SequentialFS using LDA on sample Sample5
Start forward sequential feature selection:
Initial columns included:  none
Columns that can not be included:  none
Step 1, added column 12, criterion value 1.65698
Step 2, added column 1, criterion value 1.23336
Step 3, added column 15, criterion value 1.07029
Step 4, added column 11, criterion value 0.997188 
Step 5, added column 16, criterion value 0.922212

My question is: What does the criterion value actually mean? It is quite clear that a smaller number seems to be the result of a smaller misclassification rate, however I am confused why the criterion value can be bigger than 1? Is that feasible? Because of the description of the sequentialfs quoted above, shouldn't the criterion value always be smaller than 1? Or do I have to divide the misclassification variable by the number of samples?

Best Answer

The criterion value is the output of your function PartLDA - in your case its the misclassification rate but it can also be any other measure that you are trying to minimise, e.g. RMSE in a linear regression model.

In your case it is the number of misclassified samples divided by number of folds. Given that you are running 10-fold CV this appears to be an estimate of misclassification rate expressed as a percentage.

Best Answer

Related Solutions

Solved – Variable selection procedure for binary classification

Solved – Forward or backward sequential feature selection

Related Question