Solved – Small Probabilities in Naive Bayes

classificationmachine learningnaive bayes

I am trying to implement Naive Bayes, but I am encountering a problem. I have 5000 word features. Hence, every sample is a binary vector of length 5000. The true labels are 1 or 0. The value of P(feature=1 | label=1) and P(feature=0 | label=1) are very small (~0.03) as the feature vector is very sparse. When I calculate the numerator i.e.

P(features | label=1) * P(label=1)

since, the probability values are very small and because of the conditional independence assumption of Naive Bayes, when I multiply 2000 such small terms, I get 0 and hence, a wrong result. What should be done?

Best Answer

The two most commonly used techniques to prevent underflows with a naive Bayes classifier are:

  1. Working in the log space
  2. Using the log-sum-exp trick

More details: Example of how the log-sum-exp trick works in Naive Bayes


FYI: