Why is the normal probability curve used to approximate the binomial probability distribution

binomial distributionnormal distribution

Background: I'm a psychology/behavioural science student. I'm trying to teach myself some stats stuff which goes beyond the scope of my current syllabus.


Question: Quoting from Chapter 6: The Normal Probability Distribution, Introduction to Probability and Statistics by Mendenhall, Beaver and Beaver (14th Ed.),

Since the normal distribution is continuous, the area under the curve at any single point is equal to $0$. Keep in mind that this result applies only to continuous random variables. Because the binomial random variable $x$ is a discrete random variable, the probability that $x$ takes some specific value—say, $x =11$ —will not necessarily equal $0$.

As far as my understanding goes, the normal probability distribution is used for continuous random variables (as also stated above), so why is it being used for approximating binomial probability distributions, which are discrete random variables? How is this approximation justified when a discrete random variable is capable of taking a certain value with a specific probability, but for a continuous random variable, the probability of it taking a specific value is $0$?


Extra: Kindly suggest corrections for the above question in case of erroneous statements.

Best Answer

This is interesting. We have many questions here asking about details of the steps of a method to use the normal distribution to approximate the binomial distribution. Few if any, however, ask what makes this method a valid method in the first place.

It is true that when you look closely, the probability densities of binomial distribution and a normal distribution are quite different. In the binomial distribution, all the probability is concentrated at a finite number of points.

But let's try a different representation of the binomial variable $B$. Instead of actually plotting the function, take each possible outcome $k$ of the distribution and construct a rectangle of width $1$ and height equal to the probability of that outcome, $p_B(k).$ Then put that rectangle upright on the $x$ axis of a graph, so that the centerline of the rectangle lies on the line $x = k.$ When you do this, you get something like the colored rectangles in the figure below:

enter image description here

(Original image here.)

Looking at a graph like this, you might notice that the rectangles derived from the binomial distribution look a lot like a Riemann sum of a normal distribution. In the figure above you can see that they come close to being a "midpoint" Riemann sum of the superimposed normal density. The middle bar is just a little too short, and if you look closely the other bars are not quite the right height either. But this is just a simple example for illustration. If the binomial distribution represented a much larger number of trials, for example $100$ trials instead of just $6,$ the rectangles would be a much better approximation of a "midpoint" Riemann sum, as long as you don't look too far into the "tails" of the normal distribution (where the binomial probability will be zero although the normal density remains positive).

The observation that makes the normal approximation work is that if you take some sequence of adjacent rectangles, they are in fact a kind of Riemann sum, not exactly the "midpoint" sum but still a relatively accurate one, approximating the area under the normal distribution between the leftmost edge of the leftmost rectangle and the rightmost edge of the rightmost rectangle. And an approximation that works in one direction works just as well in the other: the area under the normal distribution is a good approximation of the sum of the areas of the rectangles, which is the sum of probabilities of a range of outcomes of the binomial.

For example, consider a binomial variable $X$ with probability $p = \frac12$ for each trial and with $n = 30$ trials, and suppose we want the probability that $7\leq X \leq 9.$ We construct rectangles for $P(X=7),$ $P(X=8),$ and $P(X=9).$ Those rectangles lie between the lines $x = 6.5$ and $x = 9.5.$ If $f_N$ is the density of a normal distribution with the same mean and variance as the binomial, the rectangles provide an approximation of the area under the normal distribution between those lines:

\begin{multline} \int_{6.5}^{9.5} f_N(t)\,dt \approx (7.5 - 6.5) f_N(7) + (8.5 - 7.5) f_N(8) + (9.5 - 8.5) f_N(9) \\ = f_N(7) + f_N(8) + f_N(9). \end{multline}

Note: Actually proving that this approximation is a good one, rather than appealing to graphical intuition, is part of one of the most important theorems of mathematical probability.

As already noted, the approximation is far from perfect. It is not good for a small number of trials, and it is not good in the "tails" of the normal distribution. It also tends not to be as good when the single-trial probability $p$ of the binomial is very close to $0$ or $1$ as it is when $p \approx \frac12.$ These issues are discussed in Normal approximation to the binomial distribution and Normal approximation of binomial distribution - limits, among other places.

Because of these issues, the normal distribution is not recommended if you want to know the probability that the binomial variable will take its smallest value, or even one of its three smallest values. We might use it for estimating the probability of a range of several outcomes nearer the middle of the binomial distribution, or the probability that the outcome is no greater than $k$ (which is not too close to the minimum or maximum value).

Related Question