Solved – the distribution of the sample variance for a Poisson random variable

distributionspoisson distributionsamplevariance

The mean and variance of a Poisson random variable $X$ are both $\lambda$ but what is the distribution of the $\operatorname{var} X$ across a series of experiments recalculating each time? I would like to compute an envelope for a mean-variance plot of a number of experiments and wondered if their was an analytical formula as an alternative to sampling.

Formally, suppose I have $K$ experiments each with $n$ observations. And let $X_{kj} \sim P(\lambda)$ each experiment $k$ and $j=1, \ldots, n$. For each experiment $k = 1, \ldots, K$, I can then calculate a

$$
s^2_k = \operatorname{var} \{ X_{ij} : i = k, j = 1, \ldots, n \}.
$$

My question is what is the distribution of the statistics $\{ s^2_1, \ldots, s^2_K \}$?
For a normal distribution, this would be $\chi^2$. Is there an analogue for Poisson?

Best Answer

The distribution of the sample variance is slightly tricky, particularly because of the way the sample mean comes into it.

Note that

  1. it has a discrete distribution,

  2. by taking deviations from the sample mean, the sizes of the positive and negative deviations will vary from sample to sample and will generally not be of the same sizes (e.g. imagine with $n=10$ that the mean is $1.9$; then deviations ($x_i-\bar{x}$) for value above the mean, will be $0.1$ or $1.1$ or $2.1$, while those below will be $-0.9$ or $-1.9$; but in the next sample the mean might be $1.7$, so the deviations will be values like $0.3$ or $-1.7$)

  3. the squaring makes both the deviations above and below the mean of different sizes even within one sample (consider deviations about the mean of -1.9, -0.9, 0.1, 1.1, 2.1; their squares are 3.61, 0.81, 0.01, 1.21 and 4.41, so the gaps between adjacent values of those jump in different increments ... and then these are "averaged" - but with n-1 denominator - to produce a sample variance)

As a result you have a discrete distribution over a pretty complicated set of values (this set is countably infinite in size). The set of values taken also varies with sample size (n=3 yields a different set of possible values than n=10). Here's an example via simulation (though the simulation is so large that the distribution displayed is essentially the population cdf -- it's accurate to within about a pixel):

ECDF of sample variance from a Poisson(1), n=10

Sample pmf of the same distribution

We can clearly see "clumpiness" in the distribution - uneven spacing, as well as an intermingled jumble of large and small probabilities.

The distributions are of course different at different values of $n$ and the Poisson mean, but the general impression (a clumpy discrete distribution, with uneven spacing and irregular progression of probabilities) is - unsurprisingly - similar across a range of values.

The issue is even trickier if you want to back out some form of confidence interval for the population variance, since I am pretty sure you won't have a pivotal quantity to work with.

However, you might be able to get somewhere via approximation. The upper tail in particular is a fair bit smoother than the lower tail and might be amenable to a continuous approximation.

It looks like either large $\lambda$ or large samples will give smooth outcomes that may have scaled chi-square approximations.

Related Question