Normal Distribution – How to Correct Continuity Errors in Estimating Poisson Distribution

distributionsnormal distributionpoisson distributionself-study

I am doing a self-study exercise which attempts to exemplify a case where the Normal Distribution is used to approximate the Poisson Distribution, since the population mean is more than 10.

I understand that when using Normal Distribution to Approximate Poisson / Binomial Distributions, there is a need for Continuity Correction Error to be managed. From my understanding, this means subtracting 0.5 from the lower bound and adding 0.5 to the upper bound.

enter image description here

However, I was faced with the following example, where 0.5 was subtracted from the equation.

I will like to clarify:

if this is because it is lower bound, as the value is less than W?

and if the following concept is correct

From my understanding, this means subtracting 0.5 from the lower bound and adding 0.5 to the upper bound.

Best Answer

The best way to deal with continuity correction is to draw a picture.

In fact I will draw two kinds of picture - what people often draw (which people seem to find more intuitive, even though it's not quite 'correct'), and what really "should" be drawn. (You would draw whichever helps you work out a given problem.)

1) Here's the binomial (drawn as a step function with each step centered at the integer counts -- even though it really isn't a step function) and the approximating normal (green curve).

enter image description here

(Sorry, my variables don't match yours, but I am sure you can translate.)

Let's blow the interesting area up a bit, and draw in the binomial probabilities we want:

enter image description here

(the magenta dashed vertical spikes represent the required probability; the area under each "step" is the same probability as the corresponding spike -- but beware putting too much weight on the drawing of the spikes, because there's a problem putting the spike representation and the area representation on the same scale like that)

To get the probability at x=910 we can (equivalently) take the height of the spike there, or add the area under the step (i.e. the area from 909.5 to 910.5). Similarly, for the probability at x=911 we can (equivalently) take the height of the spike there, or add the area under the step (i.e. the area from 910.5 to 911.5), and so on.

So to get the probability for $X\geq 910$, we need $P(x=910)+P(911)+P(912)+...$ which is the same as the area of the step function from 909.5 up.

enter image description here

That (exact!) step-function area is approximated by the area under the green normal distribution from 909.5 up:

enter image description here

That's why you look at $P(W>910-0.5)$.


2) What we're actually doing with the normal approximation is approximating the distribution function:

enter image description here

Let's look at a blown up section of that kind of diagram:

enter image description here

The total probability we're after is the height above $F(x)$, which is depicted by the blue line. The green curve passes (roughly) somewhere near the middle of the vertical step, so if we take $P(W>x)$, the answer will tend to be too small (pink arrow).

However, if the normal curve passes close to the middle of the vertical bars it should also pass somewhere close to the middle of the horizontal bars, so $P(W>x-\frac{1}{2})$ should be close to the required probability (purple arrow length is closer to the blue length).


In the case of our actual problem (as if often the case in the tail with asymmetric binomials), it's not actually very close (so the "-0.5" part doesn't really help much at all):

enter image description here


if this is because it is lower bound, as the value is less than W?

Yes, because the required probability was for the given value of $x$ (in my notation) and above, you need to include the rectangle of area there, so you go down $\frac{1}{2}$; it's rather easy to work out which way to go with the diagram, I find the wordy explanations sometimes get confusing.

From my understanding, this means subtracting 0.5 from the lower bound and adding 0.5 to the upper bound.

Yes, that's correct. In complicated situations (especially when dealing with discrete distributions that aren't on the integers, such as scaled versions of counts) - even after having done this kind of thing many times - I still tend to draw the diagram.

Related Question