[Math] Normal approximation of binomial distribution with finite n corrections

analysisprobabilitystatistics

I know I can approximate binomial distribution $B(n,p)$ with normal distribution
$N(np,np(1-p))$.

For finite $n$, I assume there are correction terms for mean and variance of the normal distribution, i.e. when $n$ is finite, a more accurate normal
approximation to $B(n,p)$ should be $N(np+A, np(1-p)+B)$, where $A$ and $B$ depend on $n$ and go to $0$ when $n\to \infty$.

Is there known formulas for $A$ and $B$?

Best Answer

No, that isn't the way you correct for small sample normal approximation of the binomial. The correction is applied with respect to the fact that a sum of discrete probability masses doesn't behave the same as an integral of a continuous probability density.

For a simple example, suppose $X$ is discrete and $Y$ is a continuous approximation of $X$. The statement $\Pr[X = 2]$, assuming that $X$ can actually take on the value $2$ with some nonzero probability, is meaningful, but $\Pr[Y = 2] = 0$ because $Y$ is continuous. So, a naive approximation would be to say something like $$\Pr[X = 2] \approx \Pr[1.5 < Y \le 2.5].$$ We don't have to use $\pm 0.5$, of course, but it is one way to do such a continuity correction. It is worthwhile to note that such correction is completely independent of any parameters of the distributions themselves.

So, for the normal approximation to the binomial, let's try an example. Suppose $X \sim \operatorname{Binomial}(n = 53, p = 0.61)$. We wish to approximate this using a suitable normal distribution so that we may calculate $$\Pr[11 \le X < 35].$$ To this end, let $$Y \sim \operatorname{Normal}(\mu = np = 32.33, \sigma^2 = np(1-p) = 12.6087).$$ Then we have $$\begin{align*} \Pr[11 \le X < 35] &\approx \Pr[10.5 \le Y \le 34.5] \\ &= \Pr\left[\frac{10.5-32.33}{\sqrt{12.6087}} \le \frac{Y - \mu}{\sigma} \le \frac{34.5-32.33}{\sqrt{12.6087}} \right] \\ &= [-6.14778 \le Z \le 0.611117] \\ &= 0.729439 - 3.92865 \times 10^{-10} = 0.729439. \end{align*}$$ Take note of the direction of the correction: if $X$ includes the probability mass at one endpoint (here, $X = 11$ is included), the correction is adjusted to include the half-integer interval beyond it; if $X$ does not include the endpoint ($X = 35$ is not included), then the correction is adjusted to exclude the half-integer interval up to that value.

Without correction, the probability we would have obtained is $0.773953$. The exact probability, computed by summing $$\Pr[11 \le X < 35] = \sum_{x=11}^{34} \binom{53}{x} (0.61)^x (0.39)^{53-x} = 0.727048.$$ So as you can see, the continuity correction that was applied gives a far superior result.

Could you conceivably derive an adjustment to the normal mean and variance that performs as well as this method of correction? I strongly doubt it, for it would need to take into account whether you mean $$\Pr[\ell < X < u], \quad \Pr[\ell \le X < u], \quad \Pr[\ell < X \le u], \quad \Pr[\ell \le X \le u],$$ not to mention the values of $\ell$ and $u$ themselves in relation to the parameters $n$ and $p$. And even if it could be done, it is likely to be a more complicated algorithm to apply than what we have shown here.