[Math] Approximating Poisson binomial distribution with normal distribution

binomial distributionnormal distributionpoisson distributionstatistics

Question

I am interested in any information about approximating the Poisson binomial distribution with the normal distribution. Specifically, I am interested in either analytic (a la Le Cam's theorem) or heuristic (e.g. “works well if the mean is greater than 10'') bounds on what happens if we approximate a Poisson binomial distribution with mean $\mu=\sum_{i=1}^n p_i$ and variance $\sigma^2=\sum_{i=1}^n p_i(1-p_i)$ by a normal distribution with mean $\mu$ and variance $\sigma^2$.

Background Research

This is the most relevent SE post I could find on the topic, but the subsequent discussion and answers don't seem to address my question. I have been able to find ample discussion (e.g. here) of approximating the binomial (not the Poisson binomial) distribution with the normal distribution, and approximating the Poisson distribution with the normal distribution, but neither of these have been helpful.

I also found the paper A Refinement of Normal Approximation to Poisson Binomial by Neammanee (2005), where the author states in the first paragraph:

[I]t is well known that the distribution of a Poisson binomial random variable can be approximated by the standard normal distribution.

The author elaborates no further, and I can't find any other information about this.

Disclaimer

I am very new to statistics but I'm reasonably well versed in analysis (though not measure). Forgive me if this is a trivial question (as it seems to me it must be).

Best Answer

Okay, after some investigating, I have learned some things about statistics. I will post this answer here in the hope that someone finds it helpful in the future. Thanks to helpful comments by @spaceisdarkgreen.

Essentially what this boils down to is the Central Limit Theorem (Wikipedia). The ``usual'' CLT applies to the sum of identically distributed random variables--that is, variables drawn from the same distribution. That does not apply in the case of the Poisson-Binomial distribution, since each variable in the sum is drawn from a Bernoulli distribution with a different mean. Thus, we need a generalization of the CLT for non-identically distributed random variables. Of course, this will require some additional assumptions on the variables, but fortunately they are easily satisfied by Bernoulli random variables.

The necessary modification is provided by the Lyapunov Central Limit Theorem (Wikipedia), (MathWorld) (note that Wikipedia uses $\mathbf{E}[\cdot]$ to denote expectation, while MathWorld uses $\langle\cdot\rangle$). Also, see this answer for a related discussion.

Anyway, the Poisson-Binomial distribution satisfies the Lyapunov condition, and hence, loosely speaking, the Poisson-Binomial distribution will converge to the normal distribution with mean and variance

$$\mu = \sum_{i=1}^n p_i, \quad \sigma^2 = \sum_{i=1}^n p_i(1-p_i)$$

respectively. To confirm this, I tested with means $p_i$ spaced uniformly between $p_0 = 0.35$ and $p_{n-1}=0.65$ for multiple values of $n$. Two plots are shown below (note that the probability mass function of the Poisson-Binomial distribution was computed via Monte Carlo sampling with $N=50\ 000%$ points, since computing the pmf explicity can become a little tricky. n=6n=300

These results suggest that in practice, the convergence may be relatively quick, with reasonable agreement after only $n=6$ (NOTE that in the case where you wish to sum an infinite sequence of Bernoulli random variables, you would require that the $p_i$ be bounded away from 0 and 1! This didn't bother me because I am interested in a finite sequence).