Transforming sum of n exponential distribution to a Poisson distribution

exponential distributiongamma distributionpoisson distributionrandom variables

Let $X_1,…,X_n$ be i.i.d exponential random variable with mean $\lambda$

$S=X_1+…+X_n$

So by finding the mgf of S, we get that $S \sim \operatorname{Gamma}(n,\lambda)$

The problem I am stuck at is how can we show:

for a given positive t, $N = max\{n:n\geq 1, S \leq t\}$ has a Poisson distribution

how should I start?

Best Answer

Here's an unconventional but for me more insightful way to do this: If $X_1, X_2, X_3, \cdots \stackrel{\text{iid}}{\sim} \text{Exp}(\lambda)$ then denote $S_k = X_1 + \cdots + X_k$, $k \geq 1$ to be the sequence of partial sums. I claim that for any fixed $n \geq 1$, the conditional distribution $(S_1, \cdots, S_n) | S_{n+1} = s$ is equal in distribution with the order statistics vector $(U_{(1)}, \cdots, U_{(n)})$ of the sample $U_1, \cdots, U_{n} \stackrel{\text{iid}}{\sim} \text{Unif}(0, s)$.

To prove this, define a linear transformation $T(x_1, \cdots, x_{n+1}) = (x_1, x_1+x_2, \cdots, x_1+\cdots+x_{n+1})$ for all $\mathbf{x} \in (0, \infty)^{n+1}$. This has range $\{\mathbf{s} \in \Bbb R^n : 0 < s_1 < \cdots < s_{n+1}\}$ and is invertible on the range, with inverse given by $T^{-1}(s_1, \cdots, s_{n+1}) = (s_1, s_2 - s_1, \cdots, s_{n+1} - s_n)$. Clearly $\det(T) = 1$ as the matrix of $T$ is upper triangular with $1$'s along the diagonal. By the change of density formula,

$$\begin{align}f_{S_1, \cdots, S_{n+1}}(s_1, s_2, \cdots, s_{n+1}) & = f_{X_1, \cdots, X_{n+1}}(s_1, s_2 - s_1, \cdots, s_{n+1} - s_n) \\ &= \lambda e^{-\lambda s_1} \lambda e^{-\lambda (s_2 - s_1)} \cdots \lambda e^{-\lambda (s_{n+1} - s_n)} \mathbf{1}_{(0 < s_1 < \cdots < s_{n+1})} \\ &= \lambda^{n+1}e^{-\lambda s_{n+1}} \mathbf{1}_{(0 < s_1 < \cdots < s_{n+1})} \end{align}$$

$S_{n+1}$ is distributed as $\text{Gamma}(n+1, \lambda)$, hence $f_{S_{n+1}}(s) = \lambda^{n+1} s^n e^{-\lambda s}/n!$, so the conditional pdf of the conditional distribution $(S_1, \cdots, S_n)|S_{n+1} = s$ is given by

$$\begin{align}f_{S_1 \cdots S_n | S_{n+1}}(s_1, \cdots, s_n | s) = \frac{f_{S_1, \cdots, S_{n+1}}(s_1, \cdots, s_n, s)}{f_{S_{n+1}}(s)} &= \frac{\lambda^{n+1} e^{-\lambda s}}{\lambda^{n+1}s^ne^{-\lambda s}/n!} \mathbf{1}_{(0<s_1<\cdots<s_n)} \\ &= n! \left(\frac{1}{s}\right)^n \mathbf{1}_{(0<s_1<\cdots<s_n)}\end{align}$$

which is precisely the pdf of $(U_{(1)}, \cdots, U_{(n)})$, as promised. There's some intuition behind this: the exponential distribution with parameter $\lambda$ is roughly speaking the time taken for a continuous process to change state: this is because if $X_n \sim\text{Geo}(\lambda/n)$, which is the time taken for a discrete process (tossing a coin with success probability $\lambda/n$) to change state (i.e., the first success to occur), then $X_n/n$ converge in distribution to $X \sim \text{Exp}(\lambda)$ (so think of tossing a coin very fast with very small success probability). Given this interpretation, if $X_1, \cdots, X_{n+1}$ are iid $\text{Exp}(\lambda)$, knowing that $S_{n+1} = s$ (it took $s$ amount of time for the process to change state $n+1$ times) doesn't specify anything about the time taken for the process to change state $k$ times for $1 \leq k \leq n$, except that they are ordered.

$N$ as above is a sort of stopping time of the continuous process, which takes values in $\Bbb N \cup \{0\}$, where after $N+1$ changes in state the total time taken exceeds $t$. $\Bbb P(N = 0)$ can be computed by hand to be $\Bbb P(S_1 > t) = e^{-\lambda t}$. In general by law of total probability,

$$\begin{align} \Bbb P(N = n) = \Bbb P(S_n \leq t, S_{n+1} > t) &= \int_t^\infty \Bbb P(S_n \leq t|S_{n+1} = u)f_{S_{n+1}}(u) du \\ &= \int_t^\infty \Bbb P(U_{(n)} \leq t) f_{S_{n+1}}(u) du\end{align}$$

Where $U_{(n)}$ is the $n$-th maximum of a sample $(U_1, \cdots, U_n)$ of size $n$ from $\text{Unif}(0, u)$. As $0 < t < u$, the distribution function is $\Bbb P(U_{(n)} \leq t) = (t/u)^n$, and as $S_{n+1}$ is distributed as $\text{Gamma}(n+1, \lambda)$, we have $f_{S_{n+1}}(u) = \lambda^{n+1} u^n e^{-\lambda u}/n!$. Plugging all of this in, we get for all $n \geq 1$,

$$\begin{align}\Bbb P(N = n) = \int_t^\infty \left ( \frac{t}{u}\right )^n \frac{\lambda^{n+1}}{n!} u^n e^{-\lambda u} du &= \frac{\lambda^{n+1}t^n}{n!}\int_t^\infty e^{-\lambda u} du \\ &= \frac{\lambda^{n+1} t^n}{n!} \frac{e^{-\lambda t}}{\lambda} = \frac{(\lambda t)^n}{n!}e^{-\lambda t}\end{align}$$

Therefore, $N \sim \text{Poi}(\lambda t)$ as the pmf's match. For what it's worth, a more conventional way to do it is to show by hand that if $X \sim \text{Gamma}(n+1, \lambda)$ and $Y \sim \text{Poi}(\lambda t)$ then $\Bbb P(X > t) = \Bbb P(Y \leq n)$, i.e., the survival function of $X$ is the cdf of $Y$. This is just an exercise in integration-by-parts. Then

$$\begin{align}\Bbb P(N = n) = \Bbb P(S_n \leq t, S_{n+1} > t) &= \Bbb P(S_{n+1} > t) - \Bbb P(S_n > t) \\ &= \Bbb P(Y \leq n) - \Bbb P(Y \leq {n-1}) \\ &= \Bbb P(Y = n)\end{align}$$

Which concludes $N \stackrel{d}{=} Y$ is a Poisson distribution.

Related Question