There is confusion in the literature. There are at least two takes.
Categorical distribution
Often, the Dirichlet-multinomial is actually not a compound Dirichlet and a multinomial, but a compound Dirichlet and categorical distribution:
$$p(z|\theta) = \prod_i \theta_i^{z_i}$$
This means that this is about only one categorical variable, not a set. The notation of above would for dice assign the vector $[1,0,0,0,0,0]$ to the face with one pip, $[0,1,0,0,0,0]$ to the face with two pips, etc. Naturally, this means that $\sum_i z_i = 1$.
This gets rid off the $\frac{n!}{\prod_i z_i!}$ factor and leads to the much shorter:
$$p(z|\theta)p(\theta|\alpha) = \frac{1}{B(\alpha)}\prod_i \theta_i^{z_i+\alpha_i-1}$$
To subsequently derive at the Dirichlet-multinomial, you'll have to integrate over:
$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{1}{B(\alpha)} \int \prod_i \theta_i^{z_i+\alpha_i-1} d\theta$$
Now, the Dirichlet didn't come from nowhere... The factor $B(\alpha)$ is a normalization factor:
$$p(\theta|\alpha) = \frac{1}{B(\alpha)} \prod_i \theta_i^{\alpha_i-1} = \frac{\prod_i \theta_i^{\alpha_i-1}}{\int_{\Delta^n} \prod_i \theta_i^{\alpha_i-1} d\theta}$$
with $\int_{\Delta^n}$ corresponding to the condition $\sum_i \theta_i = 1$. (Feel free to improve my notation.)
In other words, the multivariate Beta function is actually this integral directly from the definition:
$$\int_{\Delta^n} \prod_i \theta_i^{\alpha_i-1} d\theta = B(\alpha)$$
And hence the integral:
$$\int \prod_i \theta_i^{z_i+\alpha_i-1} d\theta = B(\alpha+z)$$
Hence:
$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{B(\alpha+z)}{B(\alpha)}$$
Or to end up with something similar looking to the Wikipedia definition:
$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{\prod_i \Gamma(\alpha_i+z_i)}{\Gamma(\sum_i (\alpha_i + z_i))} \frac{\Gamma(\sum_i \alpha_i)}{\prod_i \Gamma(\alpha_i)}$$
Collecting terms:
$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{\Gamma(\sum_i \alpha_i)}{\Gamma(\sum_i \alpha_i + \sum_i z_i)} \prod_i \frac{\Gamma(\alpha_i + z_i)}{\Gamma(\alpha_i)}$$
Note, however that we run $i$ here over the entries in our categorical variable $z$ represented as a vector! This is very different from the Wikipedia definition!
Multinomial distribution
In case of an actual multinomial distribution, counts of $z$ - let's write them $n(z)$ - are actually the topic of consideration, not $z$ itself.
$$p(z|\theta) = \frac{(\sum_k n(z_k))!}{\prod_k (n(z_k)!)} \prod_k \theta_k^{n(z_k)}$$
We now run over $k$ unique variables, not over a vectorized categorical variable.
Of course, we can now again multiply with a Dirichlet distribution and the derivation is along the lines as described before. The result:
$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{(\sum_k n(z_k))!}{\prod_k (n(z_k)!)} \frac{\Gamma(\sum_k \alpha_k)}{\Gamma(\sum_k \alpha_k + \sum_k n(z_k))} \prod_k \frac{\Gamma(\alpha_k + n(z_k))}{\Gamma(\alpha_k)}$$
The term $\sum_k n(z_k)$ can be simplified to $N$. This is the actual Dirichlet-multinomial. It's not pretty, but this is what it is.
N categorical distributions
The third option, and this is meant at the Wikipedia page is the distribution of a sequence of categorical variables. Recall that the multinomial assigns probabilities to the number of extract balls (in an experiment getting n balls out of a bag with k ball types). A sequence of categorical variables assigns a probability to a sequence and has a form without the normalization factor:
$$p(z|\theta) = \prod_k \theta_k^{z_k}$$
Here $k$ runs over the categories. We can now follow the derivation as with the single categorical variable.
$$ \int p(z|\theta) p(\theta|\alpha) d\theta = \frac{\Gamma(\sum_k \alpha_k)}{\Gamma(\sum_k \alpha_k + \sum_k n(z_i=k))} \prod_k \frac{\Gamma(\alpha_k + n(z_i=k))}{\Gamma(\alpha_k)}$$
Note for example that we have $\alpha_k$; the parameter $\alpha$ is now considered the same for each cluster $k$.
Best Answer
You can use the
gamma
andgammaln
in MATLAB to directly compute the expression you have. I would recommendgammaln
since you will have very big numbers, and the logarithmic form avoids computing the ratio of big numbers.Also, if you want to sample from Dirichlet, you can generate bunch of gamma random variables using
gamrnd
(in statistics toolbox) and then normalize.