Solved – Naive Bayes feature probabilities: should I double count words

classificationconditional probabilitynaive bayes

I'm prototyping my own Naive Bayes bag o' words model, and I had a question about calculating the feature probabilities.

Let's say I've got two classes, I'll just use spam and not-spam since that's what everyone uses. And let's take the word "viagra" as an example. I have 10 emails in my training set, 5 spam and 5 non-spam. "viagra" appears in all 5 spam documents. In one of the training documents it appears 3 times (this is what my question is about), so that's 7 appearances in spam total. In the non-spam training set, it appears 1 time.

If I want to estimate p(viagra | spam) is it simply:

p(viagra | spam) = 5 spam documents contain viagra / 5 spam documents total = 1

In other words, does the fact that one document mentioned viagra 3 times instead of once really not matter?


Edit:
Here's a blog post where the author uses the approach I just laid out:
http://ebiquity.umbc.edu/blogger/2010/12/07/naive-bayes-classifier-in-50-lines/

And here's a blog post where the author says:
p(viagra | spam) = 7 viagra spam mentions / 8 total mentions
http://www.nils-haldenwang.de/computer-science/machine-learning/how-to-apply-naive-bayes-classifiers-to-document-classification-problems

And then one of the answers below says it should be:
p(viagra | spam) = 7 viagra mentions in spam / total term count in spam

Can anyone link to a source that gives an opinion on this?

Best Answer

In other words, does the fact that one document mentioned viagra 3 times instead of once really not matter?

It does matter. The Multinomial Naive Bayes model takes into account each occurrence of a token, whereas the Bernoulli Naive Bayes model does not (i.e. for the latter model, 3 occurrences of "viagra" is the same as 1 occurrence of "viagra").

Here are two illustrations as well as a comparison table from {1}:

enter image description here

enter image description here

enter image description here

{1} neatly introduces Naive Bayes for text classification, as well as the Multinomial Naive Bayes model and the Bernoulli Naive Bayes model.


References:

Related Question