I am trying to implement Naive Bayes, but I am encountering a problem. I have 5000 word features. Hence, every sample is a binary vector of length 5000. The true labels are 1 or 0. The value of P(feature=1 | label=1) and P(feature=0 | label=1) are very small (~0.03) as the feature vector is very sparse. When I calculate the numerator i.e.
P(features | label=1) * P(label=1)
since, the probability values are very small and because of the conditional independence assumption of Naive Bayes, when I multiply 2000 such small terms, I get 0 and hence, a wrong result. What should be done?
Best Answer
The two most commonly used techniques to prevent underflows with a naive Bayes classifier are:
More details: Example of how the log-sum-exp trick works in Naive Bayes
FYI: