Say we are training a radial SVM with parameters $C$ and $\gamma$. We would typically do this using grid search and CV. My prof. alluded that one does not have to exhaust all of $C$ values in the grid to obtain the optimal solution and I'm sure the popular packages don't search over all $C$ values either for the sake of speed. So my question is how is it done optimally, i.e. going over the least amount of $C$ values (for a given $\gamma$)?
I'm thinking that if one start from the lowest value of $C$, we may stop once the performance no longer improves. Is this optimal?
Best Answer
Hastie et al. made this point obsolete. Using the regularization paths for ordinary SVMs would allow one to explore many values for C faster (at the order of magnitude of the time of a single evaluation). See this paper: The Entire Regularization Path for the Support Vector Machine.
So you can investigate the whole path at once to pick the optimal cost without having to depend on heuristics.
While R wasn't mentioned, you can fiddle with the method in the R package
svmpath
, created by Hastie himself. [CRAN link and manual].