Classification with Vowpal Wabbit – Understanding Prediction Output

classificationvowpal-wabbit

I've trained a classifier model using Vowpal Wabbit to decide if a person is Male or Female based on name alone. I assigned labels Male=0 and Female=1. When I ran Vowpal Wabbit in prediction mode, the output had values varying between 0.0 and 1.0. I interpreted this as p <= 0.5 ==> Male and p > 0.5 ==> Female. With this the accuracy of prediction was 85%.

I'm now trying to train a classifier model for another problem where there are 200 labels instead of just two like above. The labels go from 0 to 199.

The prediction output has values between 0.0 and 199.0. When I get a value such as 99.56 (for example), how do I interpret this? Does it map to label 100?

Best Answer

I've come to understand this in the following terms:

There are two modes of operation for classification.

1) Binary classification 2) Multi-class classification

In Vowpal Wabbit, Multi-class classification is implemented as a learning reduction mechanism using Binary classification.

I view the binary classification output value (ranging between 0.0 and 1.0) as a confidence level for the label assigned the index 1.

Eg: if male = 1 and female = 0, any value >= 0.5 represents a higher confidence that the item being classified is male than female.

I was using Vowpal wabbit incorrectly when I tried to fit a multi-class problem into this style of operation and believe the results I got are meaningless. You can explore using the learning reduction method called "One-Against-All" exposed via that option --oaa in the vw tool. This will make predictions which map to the labels specified by you.

Eg: if the classes are red=0, blue=1, green=2 the predictions will no longer be floating point values but map exactly to 0, 1 and 2.

HTH