[Math] Normal distribution finding probability between 2 numbers

normal distributionprobability

have recently encountered this question in my quiz.

A computer randomly dials telephone numbers. When a person answers a note is made whether the person is male or female. If 70 people answer the call. Assume that there are as many males as females (50% male, 50% female)

what is the probability that between 33 and 36 are female?

This looks like a normal distribution question to me. Therefore, if there can be as many males as females. The probability of female answering the call is 0.50%

The average number of female would be 35 (70 * 0.5)

Let's say the standard deviation is 1 since the call count from 1,2,3,4,5,6,7 …

Using the z score equation –

z=(x- µ)/ σ

z=(33-35)/1

z=-2

P(z= -2)=0.0028

z=(36-35)/1

z=1

P(z= 1)=0.8413

P(-2 ≤z ≤1)= P(z= 1)-P(z=-2)

P(33 ≤female ≤36)=0.8413-0.0028 = 0.8385

This doesn't seem right to me.

How should I approach a question such as this?

Best Answer

Let's do it the exact way first, then use your normal distribution approach.

Under the assumption that a person answering the call is equally likely to be male as they are likely to be female, then the random number $X$ of females in $n = 70$ answered calls is a binomial random variable, namely $$X \sim \operatorname{Binomial}(n = 70, p = 0.5), \\ \Pr[X = x] = \binom{70}{x} (0.5)^x (1 - 0.5)^{70 - x} = \binom{70}{x} (0.5)^{70}.$$ Therefore, $$\Pr[33 \le X \le 36] = \sum_{x=33}^{36} \binom{70}{x} (0.5)^{70} = (0.5)^{70} \left(\binom{70}{33} + \binom{70}{34} + \binom{70}{35} + \binom{70}{36} \right).$$ With the aid of a computer (although it can be tediously computed by hand), this is exactly $$\Pr[33 \le X \le 36] = \frac{26909546368186020357}{73786976294838206464} = 0.36469235791233373812\ldots.$$ This represents the precise probability, with no approximations.

Now let's use the normal distribution approach. The idea is to model the random variable $X$ with a suitable normal distribution whose mean and variance match the binomial mean and variance; i.e., $$X \sim \operatorname{Normal}(\mu = np, \sigma^2 = np(1-p)),$$ which gives $\mu = 35$, and $\sigma^2 = 17.5$ for $n = 70$ and $p = 0.5$. Then we standardize $X$: $$\Pr[33 \le X \le 36] \approx \Pr\left[\frac{33 - 35}{\sqrt{17.5}} \le \frac{X - \mu}{\sigma} \le \frac{36 - 35}{\sqrt{17.5}} \right] \approx \Pr[-0.478091 \le Z \le 0.239046],$$ where $$Z = \frac{X - \mu}{\sigma} \sim \operatorname{Normal}(0, 1)$$ is a standard normal random variable whose probabilities we can look up in a table. If we do, we find $$ \Pr[-0.478091 \le Z \le 0.239046] = \Phi(0.239046) - \Phi(-0.478091) \approx 0.594465 - 0.316293 \approx 0.278172.$$

Where did we go wrong? Why is this approximation so poor? The problem here is that we did not employ continuity correction. The approximation we used fails to capture the full probability mass at $X = 33$ and $X = 36$. To compensate, we must instead write $$\Pr[33 \le X \le 36] \approx \Pr[33 - 0.5 \le X \le 36 + 0.5],$$ because both endpoints of the interval are included, so we must enlarge the interval by $0.5$ in each direction. Then the revised bounds on the standardized normal is $$\Pr[-0.597614 \le Z \le 0.358569] \approx 0.364992,$$ and as you can see, this result is much closer to the exact value we showed above.

Where did your computation go wrong? Well, for one thing, your reasoning for the standard deviation $\sigma$ is incorrect, as I have explained. Next, you didn't use continuity correction. As we saw, this is needed when using a continuous probability distribution to approximate a discrete distribution, and the error that occurs without continuity correction is large in this case because the standardized endpoints $[-0.47, 0.23]$ are not far enough away from $0$, where much of the probability mass lies in a standard normal distribution.

Related Question