Dirichlet Distribution – Finding the Marginal Distribution of K-variate Dirichlet

dirichlet distributionmachine learningself-study

I've already seen https://math.stackexchange.com/questions/1064995/marginal-of-dirichlet-distribution-is-beta-integral, but need to extend this to the $K$-variate case.

We have $\mathbf{x} = \begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_K\end{bmatrix}$ following a Dirichlet distribution with parameter vector $\mathbf{a} = \begin{bmatrix}
a_1 \\
a_2 \\
\vdots \\
a_K\end{bmatrix}$ such that $\sum_{k=1}^{K}x_k = 1$ and $x_k \in [0, 1]$ for $k = 1, \dots, K$ with density function
$$p(\mathbf{x}) = \dfrac{\Gamma(\sum_{i=1}^{K}a_i)}{\prod_{i=1}^{K}\Gamma(a_i)}x_1^{a_1-1}\cdots x_{K-1}^{a_{K-1}-1}\left(1-\sum_{\ell = 1}^{K-1}x_\ell\right)^{a_{K}-1}\text{.}$$
Given $j \in \{1, \dots, K-1\}$, we have
$$p(x_j)=x_j^{a_j – 1}\underbrace{\int_{0}^{1}\cdots\int_{0}^{1}}_{K-2\text{ times }}\left(\prod_{p \neq j}x_p^{a_p – 1}\right)\left(1-\sum_{\ell = 1}^{K-1}x_\ell\right)^{a_{K}-1}\text{ d}\mathbf{x}_{-j}\tag{1}$$
where $\mathbf{x}_{-j}$ is $\mathbf{x}$ without $x_j$.

How does one evaluate the integral given in $(1)$?

Edit: I know that the integral is wrong, since it's not integrating over the simplex. But I'm not sure how the limits would be formed. No reference I've found has shown how to integrate this to find the marginals. Please DO NOT use the method found at http://www.mas.ncl.ac.uk/~nmf16/teaching/mas3301/week6.pdf; I've already seen this. I would like to see this done using integration.

Also, given that $x_K$ doesn't appear anywhere in the density above… I'm concerned that the way in which the $x_i$ are ordered matters. How would one obtain the density for $x_K$ in this case since $x_K$ doesn't explicitly appear in $p(\mathbf{x})$? Would one just find the PDF of $1-\sum_{\ell=1}^{K-1}x_{\ell}$?

Best Answer

The marginal distribution of $x_j$ is,

$$ p(x_j) = \frac{1}{B({\bf a})} \int_0^{1 - x_j} \int_0^{1 - x_j - x_1} \cdots \int_0^{1 - \sum_{k =1}^{K-2} x_k} \prod_{p=1}^{K-1} x_p^{a_p - 1} \left( 1 - \sum_{l=1}^{K-1} x_l \right)^{a_K - 1} d x_{K-1} d x_{K-2} \dots d x_1, $$ where $\bf a$ is the vector of all $a_j$ values, $B({\bf a})$ is the multivariate Beta function, and the integration variables do not include $d x_j.$ We can marginalize out $x_{K-1}$ by doing the innermost integral,

$$ \int_0^{1 - \sum_{k =1}^{K-2} x_k} x_{K-1}^{a_{K-1} -1} \left( 1 - \sum_{l=1}^{K-1} x_l \right)^{a_K - 1} d x_{K-1}. $$ Let $z (1 - \sum_{k=1}^{K-2} x_k) = x_{K-1}.$ Then the above integral becomes,

$$ \begin{split} & \left( 1 - \sum_{k=1}^{K-2} x_k \right)^{a_{K-1}-1} \int_0^1 z^{a_{K-1}-1} \left( [1 - z] \left[1 - \sum_{k=1}^{K-2} x_k \right] \right)^{a_K -1} \left( 1 - \sum_{k=1}^{K-2} x_k \right) dz \\ = & \left( 1 - \sum_{k=1}^{K-2} x_k \right)^{a_K + a_{K-1} -1} \int_0^1 z^{a_{K-1} -1} (1-z)^{a_K -1} dz \\ = & \left( 1 - \sum_{k=1}^{K-2} x_k \right)^{a_K + a_{K-1} -1} B(a_{K-1}, a_K). \end{split} $$

Plugging this into $p(x_j)$ we have,

$$ p(x_j) = \frac{B(a_{K-1}, a_K)}{B({\bf a})} \int_0^{1 - x_j} \int_0^{1 - x_j - x_1} \cdots \int_0^{1 - \sum_{k =1}^{K-3} x_k} \prod_{p=1}^{K-2} x_p^{a_p - 1} \left( 1 - \sum_{k=1}^{K-2} x_k \right)^{a_K + a_{K-1} -1} d x_{K-2} d x_{K-3} \dots d x_1. $$

Compare this to the original expression. This is very similar, except that the "determined" value $x_K$ has its role replaced by $x_K + x_{K-1}.$ We can now marginalize out $x_{K-2}$ purely by analogy to how we marginalized $x_{K-1}$ (i.e. replacing $a_K$ with $a_K + a_{K-1},$ etc.):

$$ p(x_j) = \frac{B(a_{K-1}, a_K) B(a_{K-2}, a_{K-1} + a_K)}{B({\bf a})} \int_0^{1 - x_j} \int_0^{1 - x_j - x_1} \cdots \int_0^{1 - \sum_{k =1}^{K-4} x_k} \prod_{p=1}^{K-3} x_p^{a_p - 1} \left( 1 - \sum_{k=1}^{K-3} x_k \right)^{a_K + a_{K-1} + a_{K-2} -1} d x_{K-3} d x_{K-3} \dots d x_1. $$

Note, however, that $B(a_{K-1},a_{K}) B(a_{K-2}, a_{K-1} + a_K) = B(a_{K-2}, a_{K-1}, a_K).$

As we can see, each iteration of this procedure involves taking out the last factor from the product in the integral, the last term from the sum in the integral, adding the last $a$ coefficient to the exponent on the sum, and adding the same coefficient to the list of variables in the multivariate Beta coefficient outside the integral. We need only apply this pattern to all variables in the integral, from the inside out. We get,

$$ p(x_j) = \frac{B({\bf a}_{-j})}{B({\bf a})} x_j^{a_j -1} (1 - x_j)^{\sum_{i \ne j} a_i -1}, $$ where ${\bf a}_{-j}$ is all values of $a_k$ for $k \ne j.$ Note that $\frac{B({\bf a}_{-j})}{B({\bf a})}$ is just $\frac{1}{B(a_j, \sum_{i \ne j} a_i)}.$ Therefore,

$$ p(x_j) = \text{Beta}(x_j; a_j, \sum_{i \ne j} a_i). $$

This was just a general version of the link provided by @marmle. (I even stole the idea of the integration substitution from it.)

EDIT: It's not clear from the notation, but in all of the integrals above, the integration variables do not include $x_j.$