Solved – Output of Scikit SVM in multiclass classification always gives same label

libsvmmulti-classoptunityscikit learnsvm

I am currently using Scikit learn with the following code:

clf = svm.SVC(C=1.0, tol=1e-10, cache_size=600, kernel='rbf', gamma=0.0, 
              class_weight='auto')

and then do fit and predict for a set of data with 7 different labels. I got a weird output. No matter which cross validation technique I use the predicted label on the validation set is always going to be label 7.

I try some other parameters, including the full default one (svm.SVC()) but as long as the kernel method I use is rbf instead of poly or linear it just would not work, while it work really fine for poly and linear.

Besides I have already try prediction on train data instead of validation data and it perfectly fit.

Does anyone see this kind of problem before and know what is going on here?

I never look at my class distribution in detail but I know it should be around 30% of them are 7, 14% are 4.

I even try a manual 1-vs-rest implementation and it is still not helpful.

Best Answer

A likely cause is the fact you are not tuning your model. You need to find good values for $C$ and $\gamma$. In your case, the defaults turn out to be bad, which leads to trivial models that always yield a certain class. This is particularly common if one class has much more instances than the others. What is your class distribution?

scikit-learn has limited hyperparameter search facilities, but you can use it together with a tuning library like Optunity. An example about tuning scikit-learn SVC with Optunity is available here.

Disclaimer: I am the lead developer of Optunity.

Related Question