Solved – Kernel SVM on sparse data

classificationkernel trickrsparsesvm

I have a sparse dataset where a lot of the columns (features) contain mostly zero values. Class labels are multiple discrete categories (10 classes to be precise). I'm wondering if this should trouble classifying the dataset by learning an SVM with kernel (say RBF, polynomial, or linear)? And which kernel should cause trouble, which should not?

Empirically, I fit an SVM with RBF kernel using R (package e1071) and it throws a lot of warnings and the prediction accuracy is very poor. I'm not sure if this is the problem with SVM and RBF kernel or something else.

Best Answer

The point of using kernels is to map data that is non-separable in input space onto a higher dimensional feature space, where it becomes easier to separate. Usually when you have a lot of features there is no need to use a nonlinear kernel. You can find more information about that in A Practical Guide to Support Vector Classification (appendix C).

If you want to use kernels you will need to find good values for the associated hyperparameter(s). For the RBF kernel you need to tune $\gamma$. If it is too large, your model will overfit, if it is too low it will underfit. You can optimize both hyperparameters of SVMs ($\gamma$ and $C$) automatically using packages like Optunity (available in R).

Related Question