I've tuned a SVM with radial kernel that has training error about 10%,
but test error is about 38%, which surprise me.
I tried to understand what may cause this and noticed the number of support vectors is high, about 50% of the observation are support vectors. Is this the cause of the problem? Does high number of support vectors leads to a high
variance SVM?
Here is the summary of the SVM.
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 31.62
gamma: 1e-04
Number of Support Vectors: 117
( 58 59 )
Number of Classes: 2
Levels:
high low
Best Answer
YES, a large number of support vectors is often a sign of overfitting.
The problem appears to be that you have chosen optimal hyperparameters based on training set performance, rather than independent test set performance (or, alternatively, cross-validated estimates).
The problem
Never evaluate a set of hyperparameters based on training set performance, because you will inevitably end up with an immense overfit. Consider the RBF kernel: $$K(\mathbf{u},\mathbf{v}) = \exp(-\gamma \|\mathbf{u}-\mathbf{v}\|^2).$$ Now, take the following limit case: $$ \lim_{\gamma\rightarrow\infty} K(\mathbf{u},\mathbf{v}) = \left\{ \begin{matrix} 0, & \text{if } \mathbf{u}\neq\mathbf{v},\\ 1, & \text{otherwise.} \end{matrix} \right. $$ Which essentially means that, for $\gamma\rightarrow\infty$ the kernel matrix becomes the unit matrix. This allows the resulting model to always fit the training data perfectly, with each instance becoming a support vector, but learn absolutely nothing.
The solution
If you want to optimize hyperparameters, you must always estimate the generalization performance of a given set of hyperparameters based on unseen data. The most common approach is via cross-validation, e.g. $k$-fold or leave-one-out.
Note that, when you are tuning hyperparameters based on cross-validation, it is still effectively possible to overfit. You can reduce this to some extent by using several iterations and/or more folds, but the inherent issue remains.