[Math] Differences between Binomial and Normal Distribution Models

probabilityprobability distributionssimulation

I've been having a bunch of trouble with a homework question. I'm almost certain that I have the answer correct by hand, but I also have to estimate it with a model using MatLab and I'm having some interesting issues with it there.

In the situation I'm given, there's a machine that produces boards with 95% chance they're inside specifications. Out of a batch of 1000, 65 were faulty. I need to do a Z-test for a single proportion to check whether the machine is behaving as expected.

So $ \bar X = 0.065$ from $ n = 1000 $ tests, with an expected proportion of $ p_0 = 0.05 $.

Using the Z-test for proportions I found:

$ T = \sqrt{n}\cdot\frac{\bar X – p_o}{\sqrt{p_0\cdot(1-p_0)}} = \sqrt{1000} \cdot \frac{.015}{\sqrt{0.05\cdot0.95}} \approx 2.1764 $

This translates to a p-value $ 1 – \Phi(2.1764) = 1 – 0.9854 = 0.0146 $.

This is (as far as I can tell – and I've used normal distribution calculators online) right.

I then have to estimate this value using a simulation in Matlab. For it I have to calculate the chance of be 0.015 off of the expected 0.05 proportion, so I'm expecting a result of 0.0296 (since it's two sided where hand-done work was one-sided).

Essentially, I'm getting a sample from a $ Bin(1000, 0.05) $ then dividing that by 1000 (I check if it's outside the expected range, and repeat a bunch of times, but that's the meaningful bit). This gives a result of ~0.0242.

The interesting bit, is that when I replace the Binomial representation with a normal approximation of the above Binomial ($ Norm(50, \sqrt{47.5}) $, then divide by 1000), I get the answer I expect.

Is this just a function of how small I've gotten – maybe a continuous, normal distribution takes into account much more than a discrete, binomial distribution? Or is it that I've definitely done something wrong?

Best Answer

Exact binomial probability:

If I understand correctly, you have $X \sim Binom(n=1000, p=.05),$ and you want to know $$P(|X/1000 - .05| > .015) = 1 - P(35 \le X \le 65) = 0.0242,$$ computed in R statistical software as

1-(pbinom(65, 1000, .05)-pbinom(34, 1000, .05))
## 0.02423487

Normal approximation:

This could be approximated to about two places using $X \sim Norm(\mu=50,\, \sigma=6.892).$ With continuity correction, this is $1 - P(34.5 < X < 65.5) = 0.0245,$ computed as

1 - diff(pnorm(c(34.5, 65.5), 50, 6.892))
## 0.02451349

Simulation:

Also, as you suggest, this probability can be approximated (to two or three places) by the following simulation: One generates a million realizations of $X \sim Binom(1000, .05)$ and checks what proportion of them satisfy the condition $|X/1000 - .05| > .015.$ The answer is 0.0244. (In R, cond is a logical vector with elements TRUE and FALSE, and the mean of a logical vector is its proportion of TRUEs.)

x = rbinom(10^6, 1000, .05)
cond = (abs(x/1000 - .05) > .015)
mean(cond)
## 0.024429

The three results are about as near to each other as can be expected. I am not quite clear how this probability relates to the hypothesis test you are trying to do. Also, I understand that R is not exactly the same as MatLab. (R is excellent open source software available free of charge at www.r-project.org for Windows, Mac, and UNIX operating systems.)

I hope you can understand the three procedures and make any necessary adaptations for your specific homework problem.

In the plot below, the histogram shows the simulated distribution of $X,$ the curve is the approximating normal density function, the purple dots are exact binomial probabilities, and the vertical red lines mark the desired boundaries.

enter image description here

Related Question