Solved – Can a Bernoulli distribution be approximated by a Normal distribution

approximationbernoulli-distributionnormal distribution

$$\sum_{i=1}^n bernoulli(p) = binomial(n,p) \approx \mathcal N(np, np(1-p)) = \sum_{i=1}^n \mathcal N(p, p(1-p))$$

Can I conclude that $\mathcal N(p, p(1-p))$ could represent an approximation of $bernoulli(p)$?

In particular, given $n$ binary RVs $X_i$ then a possible naive factorization of $P(X_1, X_2, \ldots, X_n)$ is $P(X_1) P(X_2) \ldots P(X_n)$.

Since all the RVs are binary, then they can be modeled as Bernoulli RVs.

In case I am not interested in the exact probability of the joint, the can I use the normal distribution to approximate each bernoulli variable?

Best Answer

Let's analyze the error.

The figure shows plots of the distribution function of various Bernoulli$(p)$ variables in blue and the corresponding Normal distributions in Red. The shaded regions show where the functions differ appreciably.

Figure 1

(Why plot distribution functions instead of density functions? Because a Bernoulli variable has no density function. The densities of good continuous approximations to Bernoulli distributions have huge spikes in neighborhoods of $0$ and $1.$)

No matter what $p$ may be, for some values of $x$ the difference between the two distribution functions will be large. After all, the Bernoullli distribution function has two leaps in it: it jumps by $1-p$ at $x=0$ and again by $p$ at $x=1.$ The Normal distribution function is going to split the greater of those two leaps into two parts, whence the larger of the two vertical differences--the largest error--must be at least $1/4.$ In fact, it's always greater even than that.

Here is a plot of the maximum difference between the two functions, as it depends on $p:$

Figure 2

It is never smaller than $0.341345,$ attained when $p=1/2.$ Because probabilities all lie between $0$ and $1,$ this is a substantial error. It is difficult to conceive of circumstances where this approximation would be acceptable, except perhaps when $x\lt 0$ or $x\gt 1:$ but then why use a Normal distribution at all? Just approximate those values as $0$ and $1,$ respectively, without any error at all.

Related Question