Convex combination of Dirichlet random variables

gamma distributionprobabilityprobability distributionsprobability theory

For positive integer $k$, let $(X_1,\ldots,X_k)\sim\mathrm{Dir}(\alpha_1,\ldots,\alpha_k)$ be a probability distribution over $k$ items drawn from a $k$-component Dirichlet distribution and $p=(p_1,\ldots,p_k)$ be another fixed distribution. What is the pdf of the random variable $Q=\sum_{i=1}^k p_i X_i$?

If each $X_i$ were independent gamma random variables with parameter $\alpha_i$, i.e., $X_i\sim\mathrm{Gamma}(\alpha_i,\beta)$, then this would be easy: by linearity, $Q\sim\mathrm{Gamma}(\sum_{i=1}^k p_i\alpha_i, \beta)$, following the notations in this lecture note. The Dirichlet random variables can be obtained by normalizing the gamma random variables, and the marginal of each component is a beta random variable. A way to show this is the case is done by observing that if $Y_i\sim\mathrm{Gamma}(\alpha_i,1)$ for $i\in\{1,2\}$, then $\frac{Y_1}{Y_1+Y_2}\sim\mathrm{Beta}(\alpha_1,\alpha_2)$. This requires that $Y_1$ and $Y_2$ are independent from this post.

I suspect that $Q$ is a beta random variable, since it looks like a convex combination of gamma random variables up to normalization. An obstacle that prevents me from showing this is that for $X_i\sim\mathrm{Gamma}(\alpha_i,\beta)$, $\sum_{i=1}^k p_i X_i$ is no longer independent of $X_1+\ldots+X_k$ (unless $p$ has some special form). Was convex combination of Dirichlet components studied before? Any comments will be appreciated.

Best Answer

There is no well-known distribution for the weighted sum of a random vector with a Dirichlet distribution. However, as partially checked in this old answer, the beta distribution can be a good approximation for it (you do not need to normalize the vector $p$ as it is already normalized).

This 2023 paper derives a novel integral representation for the density of a weighted sum of Dirichlet distributed random variables (Appendix A.1, page 15); you can use it if you want the exact distribution. This paper also presents various non-asymptotic Gaussian-based bounds for probabilities of linear transformations of a Dirichlet random vector.

Regarding your results: Note that for $c>0$, and $X$ and $Y$ that are independent with

$$X \sim \text{Gamma} (\alpha_1, \lambda), Y \sim \text{Gamma} (\alpha_2, \lambda),$$

we have

$$ cX \sim \text{Gamma} \left ( \alpha_1, \frac{\lambda_1}{c} \right ) $$

$$X +Y \sim \text{Gamma} \left (\alpha_1+\alpha_2, \lambda \right).$$

Hence, generally there is no $\alpha'$ and $\lambda'$ such that $$p_1X+p_2Y \sim \text{Gamma} \left (\alpha', \lambda' \right), $$ unless $p_1=p_2=p$, for which we have $$pX+pY \sim \text{Gamma} \left (\alpha_1+\alpha_2, \frac{\lambda}{p} \right).$$