Unfortunately, because of the filter tree / elimination implementation in ECT, getting a measure of confidence is not straight-forward. If you can sacrifice some speed, using -oaa with logistic loss and the -r (--raw_predictions) option gives you raw scores that you can convert to a normalized measure of relative "confidence". Say you have a file like this in "ect.dat":
1 ex1| a
2 ex2| a b
3 ex3| c d e
2 ex4| b a
1 ex5| f g
We run the one-against-all:
vw --oaa 3 ect.dat -f oaa.model --loss_function logistic
Then run prediction with raw scores output:
vw -t -i oaa.model ect.dat -p oaa.predict -r oaa.rawp
You get predictions in oaa.predict:
1.000000 ex1
2.000000 ex2
3.000000 ex3
2.000000 ex4
1.000000 ex5
and raw scores for each class in oaa.rawp:
1:0.0345831 2:-0.0888872 3:-0.533179 ex1
1:-0.241225 2:0.170322 3:-0.749773 ex2
1:-0.426383 2:-0.502638 3:0.154067 ex3
1:-0.241225 2:0.170322 3:-0.749773 ex4
1:0.307398 2:-0.387151 3:-0.502747 ex5
You can map these using 1/(1+exp(-score))
and then normalize in various ways to get something like these:
1:0.62144216 2:0.5328338 3:0.20096953 ex1
1:0.57251362 2:0.71125717 3:0.1433303 ex2
1:0.37941591 2:0.29294807 3:0.66095287 ex3
1:0.57251362 2:0.71125717 3:0.1433303 ex4
1:0.72177734 2:0.37525053 3:0.2704246 ex5
Once you have a significantly large data set scored, you can plot threshold in steps of 0.1, for instance, against percent correct if using that threshold to score, to get an idea of what threshold will give you, say, 95% correct for class 1, and so on.
This discussion might be useful.
There are 2 columns of floating-point numbers because you specified 2 topics in your LDA model with the number immediately after --lda.
The first column is numeric and defaults to 262143 elements independent of input size because of the feature hashing that Vowpal Wabbit does. The --help
text for --readable_model arg
says "Output human-readable final regressor with numeric features" so that is by design, even though it might not pass all tests for "human-readable" (see UX.SE for more discussion on that topic). You can change the number of rows with the -b option (example given here: "-b 16
: We expect to see at most 2^16 unique words."). The default is -b 18
; 2^18-1 = 262143 rows.
If you convert terms to numbers using an external dictionary so your input file has integers in place of words, VW will conveniently use those integers as the hash value directly, without requiring --audit
or --invert_hash
.
Best Answer
I've come to understand this in the following terms:
There are two modes of operation for classification.
1) Binary classification 2) Multi-class classification
In Vowpal Wabbit, Multi-class classification is implemented as a learning reduction mechanism using Binary classification.
I view the binary classification output value (ranging between 0.0 and 1.0) as a confidence level for the label assigned the index 1.
Eg: if male = 1 and female = 0, any value >= 0.5 represents a higher confidence that the item being classified is male than female.
I was using Vowpal wabbit incorrectly when I tried to fit a multi-class problem into this style of operation and believe the results I got are meaningless. You can explore using the learning reduction method called "One-Against-All" exposed via that option --oaa in the vw tool. This will make predictions which map to the labels specified by you.
Eg: if the classes are red=0, blue=1, green=2 the predictions will no longer be floating point values but map exactly to 0, 1 and 2.
HTH