Solved – Why do we need Laplace smoothing in Naive Bayes while logarithm may resolve the problem

laplace-smoothinglogarithmnaive bayes

In Naive Bayes algorithm, we use $$P(c)P(x_1|c)P(x_2|c)…p(x_n|c)\space\space (*)$$ to decide about the class of a sample $\textbf{x} =(x_1,…,x_n)$. It is possible that for a class $c$, a feature $x_i$ and a value $\alpha$, there is no sample in the training set belonging to class $c$ where $x_i=\alpha$. Hence, $p(x_i=\alpha|c)$ is equal to zero according to the training set and the value of $(*)$ would be zero, since it is a product of some terms. To avoid this problem, the Laplace smoothing is presented.

The question: Why we just not take a logarithm of $(*)$ to obtain the following equation?
$$\log P(c)+ \log P(x_1|c)+ \log P(x_2|c)+…+\log p(x_n|c)$$
Now, if one term is equal to zero, we can just ignore it from the above equation since this equation contains the sum of some terms.

Best Answer

We do not use logarithms because summing them gives different results then multiplying the non-logs, but because they behave the same. $\log 0 = \infty$ and $x + \infty = \infty$, so after taking logs you will still end up with zeros.

The whole idea of Laplace smoothing is that you adjust your data so that zeros become some more or less arbitrary small values. You impose your assumption that the observed zeros are in fact impossible and wrong, so they are corrected using your a priori knowledge.

Related Question