Solved – Calculating feature probabilities for Naive Bayes

bayesianclassificationprobability

I'm reading "Building Machine Learning Systems with Python" by Willi Richert and Luis Pedro Coelho and I got into a chapter concerning sentiment analysis. There is a whole example about classifying a tweet using Naive Bayes method.

Problem

How to calculate the probability of features $F_1$ and $F_2$

Data

Assuming that the data set is as follows (content of the tweet / class):

  1. Awesome (positive)
  2. Awesome (positive)
  3. Awesome crazy (positive)
  4. Crazy (positive)
  5. Crazy (negative)
  6. Crazy (negative)

And introducing the following variables:

  • $C = \{pos, neg \}$ Class of the tweet
  • $F_1$ = Counting the occurence of awsome in the tweet
  • $F_2$ = Counting the occurence of crazy in the tweet

Target

The target is to calculate (or estimate)

$$
P(C|F_1,F_2) = \frac {P(C) \cdot P(F_1,F_2|C)}{P(F_1,F_2)}
$$

That can be also expressed as:

$$
posterior = \frac {prior \cdot likelihood} {evidence}
$$

Steps

Prior

P(C) is the prior probability of class C without knowing about the data. It's value is as follows:
$$
P(C = "pos") = \frac {4}{6} = 0.67
$$
$$
P(C = "neg") = \frac {2}{6} = 0.33
$$

Likelihood

Knowing the fact that the features ane naive we can also calculate $P(F_1,F_2|C)$ using the formula:

$$
P(F_1,F_2|C) = P(F_1|C) \cdot P(F_2|C)
$$

So the target is now:

$$
P(C|F_1,F_2) = \frac {P(C) \cdot P(F_1|C) \cdot P(F_2|C)} {P(F_1,F_2)}
$$

In this particular problem:
$$
P(C="pos"|F_1,F_2) = \frac {P(C="pos") \cdot P(F_1|C="pos") \cdot P(F_2|C="pos")}{P(F_1,F_2}
$$
$$
P(C="neg"|F_1,F_2) = \frac {P(C="neg") \cdot P(F_1|C="neg") \cdot P(F_2|C="neg")}{P(F_1,F_2}
$$

We can now calculate likelihoods:
$$
P(F_1=1|C="pos") = \frac{3}{4} = 0.75
$$

$$
P(F_2=1|C="pos") = \frac{2}{4} = 0.5
$$

$$
P(F_1=1|C="neg") = \frac{0}{2} = 0
$$

$$
P(F_2=1|C="neg") = \frac{2}{2} = 1
$$

Evidence

In the book it is written that the evidences can be retrieved by calculating the fraction of all training data instances having particular feature value.

The formula is as follows:

$$
P(F_1,F_2) = P(F_1,F_2|C="pos") \cdot P(C="pos") + P(F_1,F_2|C="neg") \cdot P(C="neg")
$$

Which leads to the following results:
$$
P(F_1=1,F_2=1) = \frac {1}{3} \cdot \frac{4}{6} + 0 \cdot \frac{2}{6} = 0.22
$$

$$
P(F_1=1,F_2=0) = \frac {2}{3} \cdot \frac{4}{6} + 0 \cdot \frac{2}{6} = 0.44
$$

$$
P(F_1=0,F_2=1) = 0 \cdot \frac{4}{6} + 1 \cdot \frac{2}{6} = 0.33
$$

$$
P(F_1=0,F_2=0) = 0
$$

Question:
How the four values above are obtained?

Best Answer

It seems you found an errata on the book. I did the calculations by hand and my results were quite different. Considering this same example has already an errata reported in the editor's site (wrong value for $P(F_2=1|C="pos")$), these strange values in the final result aren't very surprising.

I'll write down the numbers I found (I'll assume you know how a achieved to them, by replacing the terms of your last formula). In the case something is not clear, just tell me and I can edit the answer and add some clarifications). Here the numbers:

$$ P(F_1=1,F_2=1) = \frac {3}{8} \cdot \frac{4}{6} + 0 \cdot \frac{2}{6} = 0.25 $$

$$ P(F_1=1,F_2=0) = \frac {3}{8} \cdot \frac{4}{6} + 0 \cdot \frac{2}{6} = 0.25 $$

$$ P(F_1=0,F_2=1) = \frac{1}{8} \cdot \frac{4}{6} + 1 \cdot \frac{2}{6} = 0.42 $$

$$ P(F_1=0,F_2=0) = \frac{1}{8} \cdot \frac{4}{6} + 1 \cdot 0 = 0.08 $$

It's hard to tell exactly what the author might have done wrong to achieve the values given in the book, but I suspect he didn't consider the "naïve" assumptions. The probability $P(F_1=0,F_2=0)$ would indeed be zero if they didn't exist. I didn't check though to see if this hypothesis is the right. It's possible also that the results are wrong just because they used incorrect values in previous steps, as the the one mentioned in the linked errata.

I hope the mystery is clarified. Drop a comment if you need some more assistance.

Related Question