Let's say you've trained your Naive Bayes Classifier on 2 classes, "Ham" and "Spam" (i.e. it classifies emails). For the sake of simplicity, we'll assume prior probabilities to be 50/50.
Now let's say you have an email $(w_1, w_2,...,w_n)$ which your classifier rates very highly as "Ham", say $$P(Ham|w_1,w_2,...w_n) = .90$$ and $$P(Spam|w_1,w_2,..w_n) = .10$$
So far so good.
Now let's say you have another email $(w_1, w_2, ...,w_n,w_{n+1})$ which is exactly the same as the above email except that there's one word in it that isn't included in the vocabulary. Therefore, since this word's count is 0, $$P(Ham|w_{n+1}) = P(Spam|w_{n+1}) = 0$$
Suddenly, $$P(Ham|w_1,w_2,...w_n,w_{n+1}) = P(Ham|w_1,w_2,...w_n) * P(Ham|w_{n+1}) = 0$$ and $$P(Spam|w_1,w_2,..w_n,w_{n+1}) = P(Spam|w_1,w_2,...w_n) * P(Spam|w_{n+1}) = 0$$
Despite the 1st email being strongly classified in one class, this 2nd email may be classified differently because of that last word having a probability of zero.
Laplace smoothing solves this by giving the last word a small non-zero probability for both classes, so that the posterior probabilities don't suddenly drop to zero.
A value 1 is added to each feature count (not just to the feature having a count/frequency of 0 ). To be more specific, consider the case of text classification. Let us suppose that $Y_k, k=1,2,\cdots K$ denote the labels of the $K$ classes, $X_j$ denote the $j^{th}$ word and $V$ denote the total number of distinct words(vocabulary size) in all of the $n=\sum_{k=1}^{K}n_{k}$ documents, where $n_{k}$ denote the number of documents labelled as $Y_{k}$. Then, the Laplace estimate of the probability for the word $X_{j}$ in the class $Y_{k}$ is given by
$$P(X_{j}|Y_{k})\dfrac{Count(X_{j},Y_{k})+1}{\sum_{j}^{V}(Count(X_{j},Y_{k})+1)}.$$
Best Answer
We do not use logarithms because summing them gives different results then multiplying the non-logs, but because they behave the same. $\log 0 = \infty$ and $x + \infty = \infty$, so after taking logs you will still end up with zeros.
The whole idea of Laplace smoothing is that you adjust your data so that zeros become some more or less arbitrary small values. You impose your assumption that the observed zeros are in fact impossible and wrong, so they are corrected using your a priori knowledge.