Solved – Cross-validation accuracy interpretation -( accuracy of 100%

cross-validationsvm

Here the setup:

I have 90% of the data for training and other 10% for testing.

I am doing stratified cross-validation on the 90% Tranining. It is a 10-class dataset. I am using LibSVM for that. When doing 10 fold cross-validation for tuning the Hypermarameters (C in the C-SVM) I get accuracies of 100%. Basically something like this:

        Training with 0.03125   -  Cross Validation Accuracy = 68.5097%
        Training with 0.12500   -  Cross Validation Accuracy = 98.3%
        Training with 0.50000   -  Cross Validation Accuracy = 100%
        Training with 2.00000   -  Cross Validation Accuracy = 100%
        Training with 8.00000   -  Cross Validation Accuracy = 100%

It is ok to have 100% accuracy in the cross-validation on the TRAINING data? In this case, should I chose C = 0.5 as the best hyper-paramter?

or instead should I move away from parsmeters that ge me 100% in the cross-validation? and why?
if I don't take those with 100%, should I take what 98%? 90%?

Thanks,

Best Answer

I wouldn't say that C>0.5 is necessarily that big. NEVER make any model choices based on the test set, as this would give an optimistically biased performance estimate. The best approach is to use nested cross-validation, where the outer cross-validation is used for performance estimation and the hyper-parameters are tuned independently within each fold using cross-validation. (i.e. if you use 10 fold cross-validation, you perform 10 separate cross-validations to tune the hyper-parameters).

Related Question