Solved – Using predict_proba with sklearn’s multiclass SVC

classificationmulti-classroc

I'm using python's sklearn for multi-class classification (SVC)
When using the predict method, i get very high scores with my dataset,
However, I want to plot ROC curves for each of my classes. That is, I would like to reduce the problem to a in_class/out_of_class problem for each of the classes.
For that i resorted to the predict_proba method of the SVC.
However, I find no correlation between the probabilities given and the predictions.
For instance, for a 5 class classification problem, I may get a prediction of the 1st class, but get a probability vector – [ 0.1, 0.2, 0.5, 0.3, 0.0 ]. The 1st class did not get the highest probability.

Does anyone know how the SVM uses its decision function to make a prediction, or how the predict_proba works on a multi-class problem?

Thanks

Best Answer

SVC's predict just uses its decision function, which is distance from the hyperplane.

According sklearn documentation, SVC's predict_proba does the following

The probability model is created using cross validation, so the results can be slightly different than those obtained by predict. Also, it will produce meaningless results on very small datasets.

according to their documentation here.

Much more details here. You will have to read Wu et al (2004) paper, mentioned in that section to figure out how exactly they did it. I am not familiar with it.