Solved – Weka: how to score an arff file using a NaiveBayes model

naive bayesweka

I trained a model with

java -Xmx6g -cp /usr/share/java/weka.jar weka.classifiers.bayes.NaiveBayes -t train.arff -d foo.nb

and I want to score a test file with it.

I tried

java -Xmx6g -cp /usr/share/java/weka.jar weka.classifiers.bayes.NaiveBayes -T test.arff -l foo.nb -p N

which produces no files but writes some text pasts of which can be construed to be scores to stdout (and is unbelievable slow):

 inst#     actual  predicted error prediction ()
     1        1:0        1:0       0.549 
     2        1:0        1:0       0.55 
     3        1:0        1:0       0.531 
     4        1:0        1:0       0.515 
     5        2:1        1:0   +   0.552 
     6        1:0        1:0       0.532 
     7        1:0        2:1   +   0.519 

If I read this correctly, the last column is the score and 3rd is the prediction based on it. Why does score 0.55 correspond to 0 (instance#2) but a smaller score 0.519 correspond to 1 (instance#7)?

Where is the output documented?

Is there a way to produce the csv score file?

Thanks!

Best Answer

First I recommend reading about Naive Bayes.

Why does score 0.55 correspond to 0 (instance#2) but a smaller score 0.519 correspond to 1 (instance#7)?

NaiveBayes decides according to values of two probability predictions not according to if they are near to one. For example class A probability is 0.1 and class B probability 0.12 then class B is the prediction. Here score means posterior probability of class given prior probabilities of features. In the above stack overflow example: Posterior probabilities (scores) was 1/20 and 1/60. Higher one is chosen.

Where is the output documented?

You may look to class source file. But not everything in weka is well documented. This output is fairly simple therefore I do not think you will find a documentation.

Is there a way to produce the csv score file?

Use following to get csv prediction output.

java_command=java -Xmx6g -cp /usr/share/java/weka.jar 
$java_command weka.classifiers.Evaluation weka.classifiers.bayes.NaiveBayes -l irisNaiveBayes.model -T test.arff -classifications weka.classifiers.evaluation.output.prediction.CSV

We use Evaluation class instead of using Naive Bayes directly. First argument to Evaluation class is classifier to use, here NaiveBayes. -classification switch allows to output CSV.

Another example with well known iris data set.

java_command=java -Xmx6g -cp /usr/share/java/weka.jar 
$java_command weka.classifiers.bayes.NaiveBayes -t iris.arff  -d irisNaiveBayes.model

$java_command weka.classifiers.Evaluation weka.classifiers.bayes.NaiveBayes -l irisNaiveBayes.model -T iris.arff -classifications weka.classifiers.evaluation.output.prediction.CSV

This command gives following output.

=== Predictions on test data ===

inst#,actual,predicted,error,prediction
1,1:Iris-setosa,1:Iris-setosa,,1
2,1:Iris-setosa,1:Iris-setosa,,1
3,1:Iris-setosa,1:Iris-setosa,,1
4,1:Iris-setosa,1:Iris-setosa,,1
5,1:Iris-setosa,1:Iris-setosa,,1
6,1:Iris-setosa,1:Iris-setosa,,1
7,1:Iris-setosa,1:Iris-setosa,,1