For a probabilistic multi-class classifier we can get probabilities of membership of a new point $x$ to each class $y_i$; in case of 3 classes suppose that we get $P(y_a|x) > P(y_b|x) > P(y_c|x)$, thus the most probable class of x is $y_a$. Now suppose that we have a multi-class svm where we can get scores of membership of $x$ to each class (according to distances from hyperlines); in case of 3 classes suppose that we get $Score(y_a|x), Score(y_b|x), Score(y_c|x)$, How is in this case the first, second and third most likely class of $x$ (without converting these scores to probabilities) ? Usually I get positive and negative values like for instance $Score1 = -8622, Score2 = 5233, Score3 = -665$
Solved – SVM confidence according to distance from hyperline
classificationprobabilitysvmuncertaintyunsupervised learning
Related Question
- Solved – libsvm on MATLAB with rbf kernel: Compute distance from hyperplane
- Solved – Can we compare classifier scores in one-vs-all/one-vs-many
- Solved – One-class SVM vs. OneVsRestClassifier for multi-label text classification task
- Solved – How to get accuracy, confusion matrix of binary SVM classifier equivalent to multiclass classification
Best Answer
It's actually possible to get probabilities out of a Support Vector Machine, which might be more useful and interpretable than an arbitrary "score" value. There are a few approaches for doing this: one reasonable place to start is Platt (1999).
Most SVM packages/libraries implement something like this (for example, the -b 1 option causes LibSVM to produce probabilities). If you're going to roll your own, you should be aware that there are some potential numerical issues, summarized in this note by Lin, Lin, and Weng (2007). They also provide some psuedocode, which might be helpful too.
Edit in response to your comment: It's somewhat unclear to me why you'd prefer a score to a probability, especially since you can get the probability with minimal extra effort. All that said, most of the probability calculations seem like they're derived from the distance between the point and the hyperplane. If you look at Section 2 of the Platt paper, he walks through the motivation and says:
The rest of the method section describes how to fit the $A$ and $B$ parameters of that sigmoid. In the introduction (Section 1.0 and 1.1), Platt reviews a few other approaches by Vapnik, Wahba, and Hasti & Tibshirani. These methods also use something like the distance to the hyperplane, manipulated in various ways. These all seem to suggest that the distance to the hyperplane contains some useful information, so I guess you could use the raw distance as some (non-linear) measure of confidence.