Distributions – Distribution of Sum of N Random Variables with Mixture of Two Exponential Distributions

distributionsexponential-familyfinite-mixture-modelmixture-distributionsum

Suppose that the random variable $Y$ follows a mixture of two exponential distributions, that is

\begin{equation}
f_Y(y) = \sum_{i=1}^{2}\pi_i f(y| \lambda_i)
\end{equation}

where $\pi$ stands for mixing weights with the property that $\sum_{i=1}^{2}\pi_i = 1$, and that $f(y|\lambda_i) = \lambda_i e^{-\lambda_i y}$, for $i=1, 2$. Let $\{Y_1, Y_2, \cdots, Y_n\}$ be a random sample from the above-mentioned mixture distribution. Now consider the new random variable $Z$, which is defined to be
\begin{equation}
Z= \sum_{j=1}^{n}Y_j
\end{equation}

Is it possible to find the density function of $Z$?

Best Answer

This derivation is directly feasible by considering the latent variable representation of a mixture random variable. Each of the $Y_i$ is associated with a latent Bernoulli variable $\xi_i\sim\mathfrak B(\pi_2)$ in the sense that $$Y_i|\xi_i=k\sim\mathfrak Exp(\lambda_{k+1})\qquad k=0,1$$ Therefore, $$Z=\sum_{i=1}^n Y_i$$ can be conditioned upon the vector of latent variables$$\boldsymbol\xi=(\xi_1,\ldots,\xi_n)$$and written as $$ Z|\boldsymbol\xi \sim\sum_{i=1}^n Y_i\big|\boldsymbol\xi \sim\Big(\underbrace{\sum_{i;\,\xi_i=0} Y_i}_{\substack{\text{sum of iid}\\ \mathfrak Exp(\lambda_1)}}+\underbrace{\sum_{i;\,\xi_i=1} Y_i}_{\substack{\text{sum of iid}\\ \mathfrak Exp(\lambda_2)}}\Big)\Big|\boldsymbol\xi $$ This means that, conditional on$$\zeta=\sum_{i=1}^n\xi_i\sim\mathfrak B(n,\pi_2)$$$Z$ is distributed as the sum of a Gamma $\mathfrak G(n-\zeta,\lambda_1)$ and of a Gamma $\mathfrak G(\zeta,\lambda_2)$, i.e., $$ Z|\zeta\sim Z_1+Z_2\qquad Z_1\sim \mathfrak G(n-\zeta,\lambda_1),\ \ Z_2\sim\mathfrak G(\zeta,\lambda_2)\tag{1} $$ The distribution of this sum (1) is itself a signed mixture of Gamma distributions with at most $n$ terms and rates either $\lambda_1$ or $\lambda_2$, as shown in the earlier X validated post of @whuber.¹ Integrating out $\zeta$ (or marginalising in $Z$) leads to a mixture of $n+1$ terms, the weight of the $k$-th term is the Binomial probability$${n\choose k}\pi_2^k\pi_1^{n-k}$$ In conclusion, the distribution of $Z$ can be represented as a signed mixture of Gamma distributions with an order $O(n^2)$ terms.

A more direct approach is to consider the $n$-fold convolution representation of the density of $Z$:

$$f_Z(z) = \int_{\mathbb R^{n-1}} \prod_{i=1}^{n-1} f_Y(y_i) f_Y(z-y_1-\cdots-y_{n-1})\,\text dy_1\cdots\,\text dy_{n-1}$$

and to expand the product of the $n$ sums $f_Y(y_i)=\pi_1 \mathfrak e(y_i|\lambda_1)+\pi_2 \mathfrak e(y_i|\lambda_2)$ into $2^n$ terms, which when regrouping identical convolution integrals again results into a sum of $n+1$ terms,

$$f_Z(z) =\sum_{k=0}^n {n\choose k}\pi_1^k\pi_2^{n-k}\int_{\mathbb R^{n-1}} \underbrace{\prod_{i=1}^k \mathfrak e(y_i|\lambda_1)}_{\substack{\text{leading to}\\ \mathfrak G(k,\lambda_1)}}\,\underbrace{\prod_{i=k+1}^n \mathfrak e(y_i|\lambda_2)}_{\substack{\text{leading to}\\ \mathfrak G(n-k,\lambda_2)}}\,\text dy_1\cdots\,\text dy_{n-1}$$

where $y_n=z-y_1-\cdots-y_{n-1}$.

The most compact representation for the density is thus $$\sum_{k=0}^{n}\binom{n}{k}\pi_2^k \pi_1^{n-k}\dfrac{\lambda_1^{n-k}\lambda_2^{k}}{\Gamma(n)}e^{-\lambda_1z} z^{n-1}\; _1F_1(k, n, (\lambda_1-\lambda_2)z)$$


¹Or equivalently a distribution with a more complex density involving a confluent hypergeometric function $_1F_1$ as shown in the earlier CV post of @Carl.

Related Question