Variance – Understanding the Variance of Compound Distributions

binomial distributiondistributionspoisson distributionvariance

The binomial distribution describes the probability of $k$ 'success' events given $N$ independent trials, each with a probability $p$ of being a success. The distribution is described by the formula $$B(k;N,p) = {N \choose k} p^k(1-p)^{N-k}.$$
The process I am particularly interested in is described by a binomial distribution, but where the number of trials itself given by another distribution, let's say $X$, instead of just a single number. An example could be that $X$ is the Poisson distribution such that $X = P(k;\lambda) = \frac{e^{-\lambda}\lambda^k}{k!}$ and the resulting compound distribution, $D$ (the distribution of my process) is given by (see here for details): $$D = \sum_{\theta}B(k;\theta,p)P(\theta;\lambda)d\theta .$$
Now for this particular distribution D, I can calculate the equation for the distribution (which happens to be Poissonian with mean $\lambda p$ for those interested) and calculate the variance of the distribution ($Var[D]=\lambda p$).

Now I have a number of questions that I would like to answer regarding this kind of process (which I hope you can help me with!):

  1. For the case where $X$ can be any arbitrary distribution, is it possible to calculate which distribution $X$ would give a minimum for $Var[D]$? (and if so – how?)
  2. Can I find two distributions $X_1$ and $X_2$ such that $Var[X_1]>Var[X_2]$ and $Var[D_1]<Var[D_2]$ (i.e. the application of the binomial distribution results in a non-trivial variation in the variances)?

I'm relatively new to this kind of statistical work, so as much detail in the answers would be great (even if it seems obvious) and apologies if any of my question are common knowledge, but I struggled to find a solution online. Thanks in advance.

Best Answer

A colleague of mine managed to answer the question. It is actually reasonably simple and is quite a nice example of a variance calculation.

Let me re-define the compound distribution above to be of the form $$\sum_{N=0}^{\infty}X(N)B(n,N;p) =: XB(n;p).$$ For a simple binomial distribution $B(n;N,p) = {N \choose n} p^n (1-p)^{N-n}$: $${E}_B[n] := \sum_{n=0}^{\infty}nB(n;N,p) = Np,$$ $${E}_B[n^2] := \sum_{n=0}^{\infty}n^2 B(n;N,p) = Np+N^2p^2-Np^2, $$ $$ \Delta_B^2 = Np-Np^2 = Np(1-p) $$ where $E$ denotes the expectation value. For the compound distribution $XB(n;p)$: $$E_{XB}[n] := \sum_{n=0}^{\infty}n \sum_{N=0}^{\infty}X(N)B(n,N;p) = \sum_{N=0}^{\infty}X(N) E_B[n] = \sum_{N=0}^{\infty}X(N) Np = p E_X[N]$$, $$E_{XB}[n^2] := \sum_{n=0}^{\infty}n^2 \sum_{N=0}^{\infty}X(N)B(n,N;p) = \sum_{N=0}^{\infty}X(N) (N^2p^2+N(p-p^2))$$ $$ = pE_X[N^2] + p(1-p)E_X[N], $$ Which results in a variance for the compound distribution of $$\Delta_{XB}^2 = p^2E_X[N^2]+p(1-p)E_X[N] - p^2E_X[N]^2 = p^2 \Delta_X^2 + p(1-p)E_X[N].$$ The particular parameter I was interested in reducing the variance for has itself a variance of $\Delta_{P}^2 = \Delta_{XB}^2/E_X[N] = p^2 \Delta_X^2/E_X[N] + p(1-p)$ and so in order to minimise the variance of this parameter I need to minimise the variance of the distribution $X$.

Related Question