Different approaches of binomial approximation using normal distribution yielding different results

binomial distributionnormal distributionprobabilityprobability distributions

I have the following question:

A process for manufacturing an electronic component yields items of
which $1$% are defective. A quality control plan is to select $100$
items from the process, and if none are defective, the process
continues. Use the normal approximation to the binomial to find the
probability that the process continues given the sampling plan
described.

Here we can define our random variable "X" as either the number of successes or number of failures, problem is that these should be equivalent in theory but are yielding different results in practice.

Approach A:
Defining X as number of failures, in other words we need $P(X=0)$
we have:
$n=100$,$p=0.01$, $\mu=np=100*0.01=1$ and standard deviation=$\sqrt(npq)=0.99$

We need probability $P(X=0)$ which is $P(x<0.5)$ due to continuity correction.

Approximating this using the standard normal we get that this is equivalent to $P(Z<\frac{(0.5-1)}{0.99})$ which is equal to $P(Z<-0.503)$ that is equal to $0.3085$

Approach B:

Defining X as number of failures, in other words we need $P(X=1000)$
we have:
$n=100$,$p=0.99$, $\mu=np=100*0.99=99$ and standard deviation=$\sqrt(npq)=0.99$

We need probability $P(X=100)$ which is $P(99.5<x<100.5)$ due to continuity correction.

Approximating this using the standard normal we get that this is equivalent to $P(\frac{(99.5-99)}{0.99}<Z<\frac{(100.5-99)}{0.99})$ which is equal to $P(0.505<Z<1.515$)

To get this we can do $P(Z<1.51)-P(Z<0.50)=0.93448-0.69146=0.24302$

Can someone explain why these results are different or what am I doing wrong? Any help would be appreciated.

Best Answer

First of all, we should establish the exact probability. For a single component selected at random, the probability it is defective should be $p = 0.01$. Therefore, the probability that there are $X = 0$ defects in a batch of $n = 100$ components is simply $$\Pr[X = 0] = (1-p)^n = (0.99)^{100} \approx 0.366032.$$

Any approximation that does not yield an result that is reasonably close to this should be suspect.

For your Approach A, you seem to think that if the number of failures cannot be negative, then $\Pr[-0.5 < X < 0.5]$ should become $\Pr[X < 0.5]$. But this is not consistent with your reasoning. The probability $\Pr[X < 0.5]$ includes all negative values for $X$. If you truly wanted to exclude negative values, you would compute $\Pr[0 \le X < 0.5] \approx 0.150212$.

Conversely, for Approach B, you don't argue that $X$ cannot exceed $100$, when this is equally true as your reasoning that $X$ cannot be negative. This is the reason why your approaches do not match.

Another reason why you have discrepancies is because you are rounding your decimal precision far too early in your calculations. You should never round to such low precision prior to computing a probability or quantile.

Finally, it should be noted that the argument for truncating the approximated distribution because the exact distribution has finite support, is invalid. You shouldn't be doing it because you will lose half of the probability mass at the respective endpoint. The correct calculation is as follows.

Let $X$ be the random number of defects. The exact distribution is $$X \sim \operatorname{Binomial}(n = 100, p = 0.01),$$ but $X$ is approximately normal with mean $\mu = np = 1$ and variance $np(1-p) = 0.99$, therefore $$\Pr[X = 0] = \Pr[-0.5 \le X \le 0.5] = \Pr\left[\frac{-0.5 - 1}{\sqrt{0.99}} \le \frac{X - \mu}{\sigma} \le \frac{0.5 - 1}{\sqrt{0.99}}\right].$$ When employing continuity correction, we do . We must include the entire probability mass, even at the endpoints of the support. This gives us $$\Pr[X = 0] \approx 0.241817.$$

Note that this approximation is poor.

Related Question