Solved – Normal approximation to Poisson: With Continuity Correction the Approximation Seems Worse

approximationcentral limit theoremnormal distributionpoisson distributionprobability

This is Exercise 3 in Section 6.3 of Probability and Statistics, 4th edition, by DeGroot and Schervish:

Suppose that the distribution of the number of defects on any given bolt of cloth is the Poisson distribution with mean 5, and the number of defects on each bolt is counted for a random sample of 125 bolts. Determine the probability that the average number of defects per bolt in the sample will be less than 5.5.

Let $X$ be the total number of defects; we want $P(X / 125 < 5.5) = P(X < 687.5) = P(X \le 687)$. By the Central Limit Theorem, $X$ is approximately normally distributed with mean $125 * 5 = 625$ and standard deviation $\sqrt{125 * 5} = 25$. Using the continuity correction, we should estimate $P(X \le 687)$ as $\Phi\left(\frac{687.5 – 625}{25}\right)$, where $\Phi$ is the CDF of the standard normal distribution.

Below, the first probability is the true probability; the second is the estimate computed with the continuity correction; the third is the estimate computed without it:

ppois(q = 687, lambda = 625) = 0.9931787
pnorm(q = 687.5, mean = 625, sd = 25) = 0.9937903
pnorm(q = 687, mean = 625, sd = 25) = 0.9934309

The estimate computed with the continuity correction is worse than the estimate computed without it. Did I make a mistake? If I didn't, why does using the continuity correction produce a less accurate estimate?

Best Answer

Your computations are correct. The fundamental difficulty is that one cannot generally expect more than a couple of places of accuracy from a normal approximation to a Poisson distribution.

For your problem, it may be best to look at the complementary probabilities in the right tail.

> 1-ppois(687, 625)
[1] 0.006821267
> 1-pnorm(687.5, 625, 25)
[1] 0.006209665
> 1-pnorm(687, 625, 25)
[1] 0.006569119

From close inspection of the plot below, one can see that the normal approximation already slightly underestimates the right-tail probability. The continuity correction takes away a little probability from that tail, which in this case happens to make the approximation even worse.

enter image description here

The continuity correction usually improves the approximation, but that may be true only when the approximation is already very good. In your problem the approximation is not good enough for a discussion of the third and fourth decimal places to be productive.