Solved – Prediction using Naive Bayes of klaR package fails

machine learningnaive bayesr

I am trying to replicate a example that I found in Tom Mitchell's book Machine Learning (1997), using R. It is a example from chapter 6.

There are 14 training examples (shown below) of the target concept PlayTennis, where each day is described by the attributes Outlook, Temperature, Humidity, and Windy.

Training examples:

Outlook,Temperature,Humidity,Windy,Play
overcast,cool,normal,true,yes
overcast,hot,high,false,yes
overcast,hot,normal,false,yes
overcast,mild,high,true,yes
rainy,cool,normal,false,yes
rainy,mild,high,false,yes
rainy,mild,normal,false,yes
sunny,cool,normal,false,yes
sunny,mild,normal,true,yes
rainy,cool,normal,true,no
rainy,mild,high,true,no
sunny,hot,high,false,no
sunny,hot,high,true,no
sunny,mild,high,false,no

Here's my code:

library("klaR")
library("caret")

data = read.csv("example.csv")

x = data[,-5]
y = data$Play

model = train(x,y,'nb',trControl=trainControl(method='cv',number=10))

Outlook <- "sunny"
Temperature <- "cool"
Humidity <- "high"
Windy <- "true"

instance <- data.frame(Outlook,Temperature,Humidity,Windy)

predict(model$finalModel,instance)

The example tries to predict the outcome for

Outlook=sunny, Temperature=cool,Humidity=high and Wind=strong

The problem is that I am getting a different prediction from the one in the book.

Here are the probabilities I've got from my code:

no          yes
0.001078835 0.9989212

Here are the book's probabilities:

no     yes
0.0206 0.0053

My code classifies the unseen data as Yes and the book's classifier classifies it as No.

Shouldn't both give the same answer since we are using the same naive Bayes classifier?

EDIT:

I replicated the example using scikit-learn MultinomialNB classifier and I have got the following probabilities

no    yes
0.769  0.231

which are similar to the normalized probabilities of the book.

Normalized probabilities of the book

no     yes
0.795  0.205

Best Answer

The problem is small enough you can work it out by hand. For your example you have $$ \begin{align*} P(outlook = sunny| play=yes) &= \frac{2}{9}\\ P(temp = cool| play=yes) &= \frac{3}{9}\\ P(humidity=high| play=yes) &= \frac{3}{9}\\ P(windy=true| play=yes) &= \frac{3}{9}\\ P(play=yes) &= \frac{9}{14}.\\ \end{align*} $$ Putting it all together you have $$ \begin{align*} P(play=yes|sunny, cool, high, true) &\varpropto \frac{2}{9} \left(\frac{3}{9}\right)^3 \frac{9}{4}\\ &\approx 0.0053, \end{align*} $$ which agrees with Mitchell. I don't use R, so I can't speak as to why the output is different. Obviously the package you're using is normalizing, but this shouldn't change the classification. If I had to guess I'd say it is the cross validation.