Solved – One vs all Linear SVM Cross validation -Parameter selection

cross-validationsvm

I'm performing one vs all classification (SVM) for a dataset. Since I'm using a linear SVM the parameters I need to tune and select are-Tolerance and C. I'm a bit confused on how to go about doing this using a 10-fold cross validation. My understanding is as follows:

  • Fold1: For different combinations of (Tolerance, C) choose the combination which give the best accuracy(Im using a one vs all classifier )

  • Fold 2: Repeat the same
    …..

  • Fold 10: Repeat the same.

So now how will I choose the best combination of (Tolerance, C) given I have ten such combinations?

Best Answer

You may be misunderstanding how cross-validation is used to select hyper-parameters.

  1. Choose a candidate value for each hyper-parameter. In other words, pick one value for $C$ and one value for $\epsilon$. If you decide to test different kernels, choose the kernel here too. This is traditionally done using a grid search, but there are other methods which may be smarter.

  2. Run the whole cross-validation procedure with the selected parameters.

  3. Measure the classifier's performance across all cross-validation folds. The choice of performance measurement is up to you--accuracy, AUC, precision/recall, etc--as is how you combine these measurements from each fold (but you probably want to find the mean or median).

  4. Repeat steps 1-3, choosing different values for $C$ and $\epsilon$ each time. The number of repeats performed here need not be related to the number of folds in the cross validation performed in steps 2-3.

  5. Finally, choose the pair of parameters that give you the highest average performance. These should be selected together; don't choose the best $C$ and the best $\epsilon$ separately.

Note that if you're comparing several different models (e.g., a neural network, Naive Bayes, and this SVM), this procedure needs to be "nested" inside the outer cross-validation that is used to compare those models.