Solved – Laplace smoothing and Dirichlet prior

bayesiandirichlet distributionlaplace-smoothingsmoothing

On the wikipedia article of Laplace smoothing (or additive smoothing), it is said that from a Bayesian point of view,

this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter $\alpha$ as a prior.

I'm puzzled about how that is actually true. Could someone help me understand how those two things are equivalent?

Thanks!

Best Answer

Sure. This is essentially the observation that the Dirichlet distribution is a conjugate prior for the multinomial distribution. This means they have the same functional form. The article mentions it, but I'll just emphasize that this follows from the multinomial sampling model. So, getting down to it...

The observation is about the posterior, so let's introduce some data, $x$, which are counts of $K$ distinct items. We observe $N = \sum_{i=1}^K x_i$ samples total. We'll assume $x$ is drawn from an unknown distribution $\pi$ (on which we'll put a $\mathrm{Dir}(\alpha)$ prior on the $K$-simplex).

The posterior probability of $\pi$ given $\alpha$ and data $x$ is

$$p(\pi | x, \alpha) = p(x | \pi) p(\pi|\alpha)$$

The likelihood, $p(x|\pi)$, is the multinomial distribution. Now let's write out the pdf's:

$$p(x|\pi) = \frac{N!}{x_1!\cdots x_k!} \pi_1^{x_1} \cdots \pi_k^{x_k}$$

and

$$p(\pi|\alpha) = \frac{1}{\mathrm{B}(\alpha)} \prod_{i=1}^K \pi_i^{\alpha - 1}$$

where $\mathrm{B}(\alpha) = \frac{\Gamma(\alpha)^K}{\Gamma(K\alpha)}$. Multiplying, we find that,

$$ p(\pi|\alpha,x) = p(x | \pi) p(\pi|\alpha) \propto \prod_{i=1}^K \pi_i^{x_i + \alpha - 1}.$$

In other words, the posterior is also Dirichlet. The question was about the posterior mean. Since the posterior is Dirichlet, we can apply the formula for the mean of a Dirichlet to find that,

$$E[\pi_i | \alpha, x] = \frac{x_i + \alpha}{N + K\alpha}.$$

Hope this helps!