I've been having a bunch of trouble with a homework question. I'm almost certain that I have the answer correct by hand, but I also have to estimate it with a model using MatLab and I'm having some interesting issues with it there.
In the situation I'm given, there's a machine that produces boards with 95% chance they're inside specifications. Out of a batch of 1000, 65 were faulty. I need to do a Z-test for a single proportion to check whether the machine is behaving as expected.
So $ \bar X = 0.065$ from $ n = 1000 $ tests, with an expected proportion of $ p_0 = 0.05 $.
Using the Z-test for proportions I found:
$ T = \sqrt{n}\cdot\frac{\bar X – p_o}{\sqrt{p_0\cdot(1-p_0)}} = \sqrt{1000} \cdot \frac{.015}{\sqrt{0.05\cdot0.95}} \approx 2.1764 $
This translates to a p-value $ 1 – \Phi(2.1764) = 1 – 0.9854 = 0.0146 $.
This is (as far as I can tell – and I've used normal distribution calculators online) right.
I then have to estimate this value using a simulation in Matlab. For it I have to calculate the chance of be 0.015 off of the expected 0.05 proportion, so I'm expecting a result of 0.0296 (since it's two sided where hand-done work was one-sided).
Essentially, I'm getting a sample from a $ Bin(1000, 0.05) $ then dividing that by 1000 (I check if it's outside the expected range, and repeat a bunch of times, but that's the meaningful bit). This gives a result of ~0.0242.
The interesting bit, is that when I replace the Binomial representation with a normal approximation of the above Binomial ($ Norm(50, \sqrt{47.5}) $, then divide by 1000), I get the answer I expect.
Is this just a function of how small I've gotten – maybe a continuous, normal distribution takes into account much more than a discrete, binomial distribution? Or is it that I've definitely done something wrong?
Best Answer
Exact binomial probability:
If I understand correctly, you have $X \sim Binom(n=1000, p=.05),$ and you want to know $$P(|X/1000 - .05| > .015) = 1 - P(35 \le X \le 65) = 0.0242,$$ computed in R statistical software as
Normal approximation:
This could be approximated to about two places using $X \sim Norm(\mu=50,\, \sigma=6.892).$ With continuity correction, this is $1 - P(34.5 < X < 65.5) = 0.0245,$ computed as
Simulation:
Also, as you suggest, this probability can be approximated (to two or three places) by the following simulation: One generates a million realizations of $X \sim Binom(1000, .05)$ and checks what proportion of them satisfy the condition $|X/1000 - .05| > .015.$ The answer is 0.0244. (In R,
cond
is a logical vector with elementsTRUE
andFALSE
, and the mean of a logical vector is its proportion ofTRUE
s.)The three results are about as near to each other as can be expected. I am not quite clear how this probability relates to the hypothesis test you are trying to do. Also, I understand that R is not exactly the same as MatLab. (R is excellent open source software available free of charge at
www.r-project.org
for Windows, Mac, and UNIX operating systems.)I hope you can understand the three procedures and make any necessary adaptations for your specific homework problem.
In the plot below, the histogram shows the simulated distribution of $X,$ the curve is the approximating normal density function, the purple dots are exact binomial probabilities, and the vertical red lines mark the desired boundaries.