Why does classification accuracy obtained using sequentialfs and cross-validation always outperform a 10-fold cross-validation using those selected features? Any help would be gratefully received!
Thanks in advance.
Barry
See code below, Acc_fs (77%) is always higher than Acc (67%): This finding holds true for muliple tests – accuracy obtained using sequentialfs always outperforms cross validated accuracy. Is this a bug in my implementation or an issue with sequentialfs.m?
%************** Perform feature selection ************
c = cvpartition(Labels,'k',num_folds);opts = statset('display','iter');fun = @(x_train,y_train,x_test,y_test)SVM_class_fun(x_train,y_train,x_test,y_test,kernel,rbf_sigma,boxconstraint);[fs,history] = sequentialfs(fun,Data,Labels,'cv',c,'options',opts);Acc_fs = 1 - history.Crit(end);%******* Cross validated classification accuracy *******
Feature_select = find(fs==1); % Features selected
Vars_select = Variables(fs==1); % Variable names of features selected
indices = crossvalind('Kfold',Labels,num_folds);Results = classperf(Labels, 'Positive', 1, 'Negative', 0); % Initialize
for i = 1:num_folds test = (indices == i); train = ~test; svmStruct = svmtrain(Data(train,Feature_select),Labels(train),'Kernel_Function','rbf','rbf_sigma',rbf_sigma,'boxconstraint',boxconstraint); class = svmclassify(svmStruct,Data(test,Feature_select)); classperf(Results,class,test); endAcc = Results.CorrectRate; % Classification accuracy
end
Function SVM_class_fun returns number of misclassified samples:
function MCE = SVM_class_fun(x_train,y_train,x_test,y_test,kernel,rbf_sigma,boxconstraint)svmStruct = svmtrain(x_train,y_train,'Kernel_Function','rbf','rbf_sigma',rbf_sigma,'boxconstraint',boxconstraint);y_fit = svmclassify(svmStruct,x_test);C = confusionmat(y_test,y_fit);N = sum(sum(C));MCE = N - sum(diag(C)); % No. misclassified sample
end
Best Answer