Solved – Optimal grid search for $C$ in SVM

hyperparametermachine learningoptimizationsvm

Say we are training a radial SVM with parameters $C$ and $\gamma$. We would typically do this using grid search and CV. My prof. alluded that one does not have to exhaust all of $C$ values in the grid to obtain the optimal solution and I'm sure the popular packages don't search over all $C$ values either for the sake of speed. So my question is how is it done optimally, i.e. going over the least amount of $C$ values (for a given $\gamma$)?

I'm thinking that if one start from the lowest value of $C$, we may stop once the performance no longer improves. Is this optimal?

Best Answer

I'm sure the popular packages don't search over all C values either for the sake of speed

Hastie et al. made this point obsolete. Using the regularization paths for ordinary SVMs would allow one to explore many values for C faster (at the order of magnitude of the time of a single evaluation). See this paper: The Entire Regularization Path for the Support Vector Machine.

So you can investigate the whole path at once to pick the optimal cost without having to depend on heuristics.


While R wasn't mentioned, you can fiddle with the method in the R package svmpath, created by Hastie himself. [CRAN link and manual].

The SVM has a regularization or cost parameter C, which controls the amount by which points overlap their soft margins. Typically either a default large value for C is chosen (allowing minimal overlap), or else a few values are compared using a validation set. This algorithm computes the entire regularization path (i.e. for all possible values of C for which the solution changes), with a cost a small (~3) multiple of the cost of fitting a single model.