Solved – Is it still an open question to select the parameter of Gaussian kernel SVMs

machine learningsvm

I am learning Gaussian kernel SVMs recently. I have to choose the parameter $\epsilon$ for Gaussian kernel,
$$k(x,t)=e^{-\frac{\|x-y\|_2^2}{2\epsilon}}.$$

I try to find the answer in literature. I found some to analyze the parameters like 'A User's Guide to Support Vector Machines' (PDF). I really want to analyze these by myself step by step. But I have work to finish, you know. I found a paper 'Practical Selection of SVM Parameters and Noise Estimation for SVM Regression' (PDF).

Best Answer

There isn't really any method that is superior to optimizing a cross-validation based estimate of some appropriate performance statistic. Bounds on the leave-one-out cross-validation error (such as the radius-margin bound or the span bound) are computationally efficient, but are no more theoretically attractive than conventional cross-validation. This optimisation can be performed via simple grid search methods, or via standard optimisation methods, such as gradient descent or Nelder-Mead simplex method (which I use quite a lot). However, if there are many kernel parameters to be optimised, it is often rather easy to over-fit the model selection criterion and end up with a model that performs rather badly. In that case, I would recommend regularising the model selection criterion.

Note that while the VC dimension of an SVM is bounded by the radius-margin bound, this is no longer valid if you optimise the kernel to maximise the radius-margin ratio. As a result, tuning the kernel to maximise the radius-margin bound does not itself fall within the VC framework.

There is a great need for more theoretical analysis of model selection for kernel machines, however I suspect progress will be relatively slow in this area as the kernel parameters are much less mathematically tractable than analysis of the kernel machine itself (which can be viewed as a linear model in a fixed, kernel-induced, feature space).