[Math] When to use the continuity correction for normal approximations of binomial probabilities.

central limit theoremprobabilitystatistics

so I'm confused as to when you actually use continuity correction. If a problem deals with a binomial distribution and we are asked to find probabilities using normal approximation (provided np>5 and n(1-p)>5), I know we use continuity correction, but do we use continuity correction even for sampling distribution of sample means for said binomial distribution? In the problem below, in part b, do we use continuity correction? In general, do we use continuity correction if we know the identity of population distribution (namely if it's binomial)? What if we didn't know the problem below was a binomial distribution, how would we know we have to use continuity correction?

enter image description here

In the next image, do we use continuity correction for part b. Based on the central limit theorem, I know if n>30 then the sampling distribution of sample means is approximately normal, but all this time I hadn't worried about continuity correction.

enter image description here

Best Answer

I will give you an outline how to work a question similar to the two numerically very different part (b)'s. I will use 2500 raisins for 100 loaves, and answer for the average of four loaves. I hope you can use this outline to see how to work whichever problem is of interest.

By the CLT the number of raisins in one loaf is approximately $$X_i \sim \mathsf{Norm}(\mu = 25, \sigma = 4.975).$$ [You should explain how to get the values $\mu$ and $\sigma$ from binomial $n$ and $p.$]

Then the mean of four loaves would be approximately

$$\bar X \sim \mathsf{Norm}(\mu_{\bar X} = 25,\, \sigma_{\bar X} = 4.975/\sqrt{4} = 2.4875).$$ [You should explain how the values $\mu_{\bar X}$ and $\sigma_{\bar X}$ are obtained form the values $\mu$ and $\sigma$ above.]

You seek $P(\bar X > 32) = P(\bar X > 32.5) = P(\bar X \ge 33).$ The continuity correction uses the second of these. [Why?] Then $$P(\bar X > 32.5) = P\left(\frac{\bar X - \mu_{\bar X}}{\sigma_{\bar X}} > \frac{32.5 - 25}{2.4875} \right) = P(Z > 3.015),$$ which you can evaluate using a printed normal table or software. But you should recognize immediately that this probability is rather small. [Why?]

Addendum: OK, it seems you are mainly interested in when to use continuity correct to improve normal approximations to binomial probabilities. Here are three relevant examples showing the effect of continuity corrections. [I will use R for quick computation, but you could get about the same normal approximations by standardizing and using printed tables.]

Continuity correction crucial for useful answer: $X \sim \mathsf{Binomial}(n = 16, p = 1/2)$ approximated by $\mathsf{Normal}(\mu = 8, \sigma = 2).$ Find $$P(6 \le X \le 9) = P(5.5 < X < 9.5) = P(5 < X < 10).$$ For the discrete binomial random variable $X,$ all three statements give identical results. But the middle form is the one to use for continuity correction. The exact binomial probability $$P(5.5 < X < 9.5) = P(X=6) + P(X=7) + P(X=8) + P(X = 9)\\ = P(X \le 9) - P(X \le 5) = 0.6677$$ to four places. The normal approximation with continuity correction gives $0.6825.$ (You can usually expect normal approximations to be accurate to about two places.) The normal approximations without continuity correction (0.5328 and 0.7745) are quite far from the mark.

sum(dbinom(6:9, 16, .5))
## 0.6676941                   # exact: P(X=6) + P(X=7) + P(X=8) + P(X=9)
diff(pbinom(c(5,9), 16, .5))
## 0.6676941                   # exact: P(X <= 9) - P(X <= 5)
diff(pnorm(c(6, 9), 8, 2)) 
## 0.5328072                   # botched norm aprx: too small
diff(pnorm(c(5.5, 9.5), 8, 2)) 
## 0.6824948                   # norm aprx w/ cont corr: closest
diff(pnorm(c(5, 10), 8, 2)) 
## 0.7745375                   # botched norm aprx: too big

In the figure below, we want the total height of the four binomial bars between the vertical broken lines. The normal approximation with continuity correction includes the area under the normal curve between the two broken lines.

enter image description here

Continuity correction important: $Y \sim \mathsf{Binomial}(n = 100, p = 1/2)$ approximated by $\mathsf{Normal}(\mu = 50, \sigma = 5).$ Find $$P(40 \le X \le 52) = P(39.5 < X < 52.5) = P(39 < X < 53).$$ The exact binomial probability is $0.6738;$ the normal approximation with continuity correction is $0.3736.$ The approximation with continuity correction is clearly better than the other two (0.6327 and 0.6736).

sum(dbinom(40:52, 100, .5))
## 0.6737502                    # exact binomial probability as sum of PDF values
diff(pbinom(c(39,52), 100, .5))
## 0.6737502                    # exact binom. probability as diff. of two CDF values
diff(pnorm(c(40, 52), 50, 5)) 
## 0.6326716                    # normal aprx, too small
diff(pnorm(c(39.5, 52.5), 50, 5)) 
## 0.673598                     # normal aprx with continuity correction. Best.
diff(pnorm(c(39, 53), 50, 5)) 
## 0.7118434

enter image description here

Continuity correction less important: $Y \sim \mathsf{Binomial}(n = 100, p = 1/2)$ approximated by $\mathsf{Normal}(\mu = 50, \sigma = 5).$ Find $$P(40 \le X \le 60) = P(39.5 < X < 60.5) = P(39 < X < 61).$$ The exact binomial probability is $0.9648;$ the normal approximation with continuity correction is $0.9643.$ The binomial approximations without the continuity correction (0.9545 and 0.9722) are not as good, but they are not disastrously misleading.

sum(dbinom(40:60, 100, .5))
## 0.9647998                       # exact binomial
diff(pnorm(c(40, 60), 50, 5)) 
## 0.9544997
diff(pnorm(c(39.5, 60.5), 50, 5))  # normal aprx with continuity correction. Best
## 0.9642712
diff(pnorm(c(39, 61), 50, 5)) 
## 0.9721931

Notes: (a) It is difficult to give rules of thumb to predict when the continuity correction will be really important, so good practice is always to use it. [Or better yet in applied situations, to use software to get the exact binomial result.]

(b) My examples all use $p = 1/2.$ Usually, normal approximation to binomial works best when $p = 1/2;$ when $p = 1/2$ the binomial distribution is symmetrical and that makes it easier for the symmetrical normal distribution to give a good approximation. When $p$ is far from $1/2,$ normal approximations may be problematic, and the continuity correction may be even more important.

(c) For full disclosure, I have to admit there are quirky cases (especially with $p$ far from $1/2$) in which continuity correction may decrease accuracy, but my opinion is that those are cases in which the normal approximation really shouldn't be used at all.

(d) In some applications when the distribution is taken to be approximately normal and rounding is customary, one does not usually use a continuity correction. For example, if you are taking men's heights, measured in inches, to be $\mathsf{Norm}(\mu = 58, \sigma = 3.5)$ and the problem is to find the probability a randomly chosen man is over 6 feet tall, then most texts wouldn't expect you to compute $P(X > 71.5).$ [Or to worry about $P(X < 0),$ which is technically positive.]