[Math] How to derive the Dirichlet-multinomial

gamma functionprobability distributions

Common knowledge

The multinomial:

$$p(z|\theta) = \frac{n!}{\prod_i z_i!} \prod_i \theta_i^{z_i}$$

And the Dirichlet:

$$p(\theta|\alpha) = \frac{1}{B(\alpha)}\prod_i \theta_i^{\alpha_i-1}$$

with $B(\alpha) = \frac{\prod_i \Gamma(\alpha_i)} {\Gamma(\sum_i \alpha_i)}$, the multivariate Beta function.
Let us neglect for the moment the factors that do not depend on $\theta_i$:

$$p(z|\theta) \propto \prod_i \theta_i^{z_i}$$

$$p(\theta|\alpha) \propto \prod_i \theta_i^{\alpha_i-1}$$

The product:

$$p(z|\theta) p(\theta|\alpha) \propto \prod_i \theta_i^{z_i+\alpha_i-1}$$

And because we know it is a Dirichlet distribution, it isn't surprising that the constant is a Beta function again:

$$p(z|\theta) p(\theta|\alpha) = \frac{1}{B(z+\alpha)} \prod_i \theta_i^{z_i+\alpha_i-1}$$

Derive the normalization constant

But how do we derive this factor $\frac{1}{B(z+\alpha)}$?

Attempt

The multinomial can be written using $\Gamma(x+1)=x!$ and $n=\sum_i z_i$ (from the definition of the multinomial) as:

$$p(z|\theta) = \frac{n!}{\prod_i z_i!} \prod_i \theta_i^{z_i} = \frac{\Gamma(\sum_i z_i+1)}{\prod_i \Gamma(z_i+1)} \prod_i \theta_i^{z_i}$$

Now, if we use $\Gamma(1+x)=x\Gamma(x)$ we can write it as:

$$p(z|\theta) = \frac{\sum_i z_i\Gamma(\sum_i z_i)}{\prod_i z_i \prod_i \Gamma(z_i)} \prod_i \theta_i^{z_i} = \frac{\sum_i z_i}{\prod_i z_i} \frac{1}{B(z)} \prod_i \theta_i^{z_i}$$

The product with the Dirichlet distribution:

$$p(z|\theta)p(\theta|\alpha) = \frac{\sum_i z_i}{\prod_i z_i} \frac{1}{B(z)} \frac{1}{B(\alpha)}\prod_i \theta_i^{z_i+\alpha_i-1}$$

I don't even need to go on here. This doesn't look very symmetric, so I did something wrong here, but what?

Best Answer

There is confusion in the literature. There are at least two takes.

Categorical distribution

Often, the Dirichlet-multinomial is actually not a compound Dirichlet and a multinomial, but a compound Dirichlet and categorical distribution:

$$p(z|\theta) = \prod_i \theta_i^{z_i}$$

This means that this is about only one categorical variable, not a set. The notation of above would for dice assign the vector $[1,0,0,0,0,0]$ to the face with one pip, $[0,1,0,0,0,0]$ to the face with two pips, etc. Naturally, this means that $\sum_i z_i = 1$.

This gets rid off the $\frac{n!}{\prod_i z_i!}$ factor and leads to the much shorter:

$$p(z|\theta)p(\theta|\alpha) = \frac{1}{B(\alpha)}\prod_i \theta_i^{z_i+\alpha_i-1}$$

To subsequently derive at the Dirichlet-multinomial, you'll have to integrate over: $$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{1}{B(\alpha)} \int \prod_i \theta_i^{z_i+\alpha_i-1} d\theta$$

Now, the Dirichlet didn't come from nowhere... The factor $B(\alpha)$ is a normalization factor:

$$p(\theta|\alpha) = \frac{1}{B(\alpha)} \prod_i \theta_i^{\alpha_i-1} = \frac{\prod_i \theta_i^{\alpha_i-1}}{\int_{\Delta^n} \prod_i \theta_i^{\alpha_i-1} d\theta}$$

with $\int_{\Delta^n}$ corresponding to the condition $\sum_i \theta_i = 1$. (Feel free to improve my notation.)

In other words, the multivariate Beta function is actually this integral directly from the definition:

$$\int_{\Delta^n} \prod_i \theta_i^{\alpha_i-1} d\theta = B(\alpha)$$

And hence the integral:

$$\int \prod_i \theta_i^{z_i+\alpha_i-1} d\theta = B(\alpha+z)$$

Hence: $$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{B(\alpha+z)}{B(\alpha)}$$

Or to end up with something similar looking to the Wikipedia definition:

$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{\prod_i \Gamma(\alpha_i+z_i)}{\Gamma(\sum_i (\alpha_i + z_i))} \frac{\Gamma(\sum_i \alpha_i)}{\prod_i \Gamma(\alpha_i)}$$

Collecting terms:

$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{\Gamma(\sum_i \alpha_i)}{\Gamma(\sum_i \alpha_i + \sum_i z_i)} \prod_i \frac{\Gamma(\alpha_i + z_i)}{\Gamma(\alpha_i)}$$

Note, however that we run $i$ here over the entries in our categorical variable $z$ represented as a vector! This is very different from the Wikipedia definition!

Multinomial distribution

In case of an actual multinomial distribution, counts of $z$ - let's write them $n(z)$ - are actually the topic of consideration, not $z$ itself.

$$p(z|\theta) = \frac{(\sum_k n(z_k))!}{\prod_k (n(z_k)!)} \prod_k \theta_k^{n(z_k)}$$

We now run over $k$ unique variables, not over a vectorized categorical variable.

Of course, we can now again multiply with a Dirichlet distribution and the derivation is along the lines as described before. The result:

$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{(\sum_k n(z_k))!}{\prod_k (n(z_k)!)} \frac{\Gamma(\sum_k \alpha_k)}{\Gamma(\sum_k \alpha_k + \sum_k n(z_k))} \prod_k \frac{\Gamma(\alpha_k + n(z_k))}{\Gamma(\alpha_k)}$$

The term $\sum_k n(z_k)$ can be simplified to $N$. This is the actual Dirichlet-multinomial. It's not pretty, but this is what it is.

N categorical distributions

The third option, and this is meant at the Wikipedia page is the distribution of a sequence of categorical variables. Recall that the multinomial assigns probabilities to the number of extract balls (in an experiment getting n balls out of a bag with k ball types). A sequence of categorical variables assigns a probability to a sequence and has a form without the normalization factor:

$$p(z|\theta) = \prod_k \theta_k^{z_k}$$

Here $k$ runs over the categories. We can now follow the derivation as with the single categorical variable.

$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{\Gamma(\sum_k \alpha_k)}{\Gamma(\sum_k \alpha_k + \sum_k n(z_i=k))} \prod_k \frac{\Gamma(\alpha_k + n(z_i=k))}{\Gamma(\alpha_k)}$$

Note for example that we have $\alpha_k$; the parameter $\alpha$ is now considered the same for each cluster $k$.

Related Question