[Math] Why normal approximation to binomial distribution uses np> 5 as a condition

normal distributionprobability distributionsstatistics

I was reading about normal approximation to binomial distribution and I dunno how it works for cases when you say for example p is equal to 0.3 where p is probability of success.

On most websites it is written that normal approximation to binomial distribution works well if average is greater than 5. I.e. np> 5
But I am unable to find where did this empirical formula came from?

If n is quite large and probability of success is equal to .5 then i agree that normal approximation to binomial distribution is going to be quite accurate. But what about other cases? How can one say np> 5 is the condition for doing normal approximation?

Best Answer

The mean $\mu$ of a binomial = np. The standard deviation of a binomial = $\sqrt{np(1-p)}$

For a normal distribution, $\mu$ should be 3 standard deviations away from 0 and n.

Therefore:

$\mu$ - $3\sqrt{np(1-p)} > 0 \hspace{2cm}$ and $\hspace{2cm}\mu$ + $3\sqrt{np(1-p)}<n$

From that starting point, algebraically you can get to the inequalities:

$np>9(1-p)\hspace{2cm}$ and $\hspace{2cm}n(1-p)>9p$

To satisfy these inequalities, as n gets larger, p has a wider range. Or you could also say the closer p is to 0.5, the smaller n you can use.

Using n=10 (for example):

$0.474<p<0.526$

As n gets larger, p does not have to be so close to 0.5. For n = 100,

$0.0826<p<0.9174$

Remarkably, even with a p = 0.9, if n >100 then the mean will be 3 standard deviations away from 0 and n.

This relates to calculating np and n(1-p), as if both are greater than 5, usually these inequalities are satisfied. However something like n=15, p=0.65 does not work, so some textbooks say np>9.

This condition does not guarantee that the binomial will fit a normal dist. but just that the mean will not be skewed too far towards 0 or n.

Related Question