It's actually possible to get probabilities out of a Support Vector Machine, which might be more useful and interpretable than an arbitrary "score" value. There are a few approaches for doing this: one reasonable place to start is Platt (1999).
Most SVM packages/libraries implement something like this (for example, the -b 1 option causes LibSVM to produce probabilities). If you're going to roll your own, you should be aware that there are some potential numerical issues, summarized in this note by Lin, Lin, and Weng (2007). They also provide some psuedocode, which might be helpful too.
Edit in response to your comment:
It's somewhat unclear to me why you'd prefer a score to a probability, especially since you can get the probability with minimal extra effort. All that said, most of the probability calculations seem like they're derived from the distance between the point and the hyperplane. If you look at Section 2 of the Platt paper, he walks through the motivation and says:
The class conditional densities between the margins are apparently exponential. Bayes' rule on two exponentials suggests using a parametric form of a sigmoid:
$$ P(y=1 | f) = \frac{1}{1+\exp(Af+B)}$$
This sigmoid model is equivalent to assuming that the output of the SVM is proportional to the log-likelihood of a positive training example. [MK: $f$ was defined elsewhere to be the raw SVM output].
The rest of the method section describes how to fit the $A$ and $B$ parameters of that sigmoid. In the introduction (Section 1.0 and 1.1), Platt reviews a few other approaches by Vapnik, Wahba, and Hasti & Tibshirani. These methods also use something like the distance to the hyperplane, manipulated in various ways. These all seem to suggest that the distance to the hyperplane contains some useful information, so I guess you could use the raw distance as some (non-linear) measure of confidence.
F-score tends to be very similar to likes of t and Kruskal-Wallis tests, when it comes to feature ranking. So, one solution can be using multi-level alternatives of t-test, e.g., ANOVA or their non-parametric versions.
Best Answer
If I understand your question, you can pass the
-b
flag with option1
when building the model like this:And then when creating the prediction vector, you do the same:
Here is an output_file example that has 3 classes: