Classification with Vowpal Wabbit – Understanding Prediction Output

classificationvowpal-wabbit

I've trained a classifier model using Vowpal Wabbit to decide if a person is Male or Female based on name alone. I assigned labels Male=0 and Female=1. When I ran Vowpal Wabbit in prediction mode, the output had values varying between 0.0 and 1.0. I interpreted this as p <= 0.5 ==> Male and p > 0.5 ==> Female. With this the accuracy of prediction was 85%.

I'm now trying to train a classifier model for another problem where there are 200 labels instead of just two like above. The labels go from 0 to 199.

The prediction output has values between 0.0 and 199.0. When I get a value such as 99.56 (for example), how do I interpret this? Does it map to label 100?

Best Answer

I've come to understand this in the following terms:

There are two modes of operation for classification.

1) Binary classification 2) Multi-class classification

In Vowpal Wabbit, Multi-class classification is implemented as a learning reduction mechanism using Binary classification.

I view the binary classification output value (ranging between 0.0 and 1.0) as a confidence level for the label assigned the index 1.

Eg: if male = 1 and female = 0, any value >= 0.5 represents a higher confidence that the item being classified is male than female.

I was using Vowpal wabbit incorrectly when I tried to fit a multi-class problem into this style of operation and believe the results I got are meaningless. You can explore using the learning reduction method called "One-Against-All" exposed via that option --oaa in the vw tool. This will make predictions which map to the labels specified by you.

Eg: if the classes are red=0, blue=1, green=2 the predictions will no longer be floating point values but map exactly to 0, 1 and 2.

HTH

Related Solutions

Solved – How to get confidence on classification predictions with multi-class Vowpal Wabbit

Unfortunately, because of the filter tree / elimination implementation in ECT, getting a measure of confidence is not straight-forward. If you can sacrifice some speed, using -oaa with logistic loss and the -r (--raw_predictions) option gives you raw scores that you can convert to a normalized measure of relative "confidence". Say you have a file like this in "ect.dat":

1 ex1| a
2 ex2| a b
3 ex3| c d e
2 ex4| b a
1 ex5| f g

We run the one-against-all:

vw --oaa 3 ect.dat -f oaa.model --loss_function logistic

Then run prediction with raw scores output:

vw -t -i oaa.model ect.dat -p oaa.predict -r oaa.rawp

You get predictions in oaa.predict:

1.000000 ex1
2.000000 ex2
3.000000 ex3
2.000000 ex4
1.000000 ex5

and raw scores for each class in oaa.rawp:

1:0.0345831 2:-0.0888872 3:-0.533179 ex1
1:-0.241225 2:0.170322 3:-0.749773 ex2
1:-0.426383 2:-0.502638 3:0.154067 ex3
1:-0.241225 2:0.170322 3:-0.749773 ex4
1:0.307398 2:-0.387151 3:-0.502747 ex5

You can map these using 1/(1+exp(-score)) and then normalize in various ways to get something like these:

1:0.62144216 2:0.5328338 3:0.20096953 ex1
1:0.57251362 2:0.71125717 3:0.1433303 ex2
1:0.37941591 2:0.29294807 3:0.66095287 ex3
1:0.57251362 2:0.71125717 3:0.1433303 ex4
1:0.72177734 2:0.37525053 3:0.2704246 ex5

Once you have a significantly large data set scored, you can plot threshold in steps of 0.1, for instance, against percent correct if using that threshold to score, to get an idea of what threshold will give you, say, 95% correct for class 1, and so on.

This discussion might be useful.

Solved – Vowpal wabbit LDA

There are 2 columns of floating-point numbers because you specified 2 topics in your LDA model with the number immediately after --lda.

The first column is numeric and defaults to 262143 elements independent of input size because of the feature hashing that Vowpal Wabbit does. The --help text for --readable_model arg says "Output human-readable final regressor with numeric features" so that is by design, even though it might not pass all tests for "human-readable" (see UX.SE for more discussion on that topic). You can change the number of rows with the -b option (example given here: "-b 16: We expect to see at most 2^16 unique words."). The default is -b 18; 2^18-1 = 262143 rows.

If you convert terms to numbers using an external dictionary so your input file has integers in place of words, VW will conveniently use those integers as the hash value directly, without requiring --audit or --invert_hash.

Best Answer

Related Solutions

Solved – How to get confidence on classification predictions with multi-class Vowpal Wabbit

Solved – Vowpal wabbit LDA

Related Question