Solved – is the z-test for difference of proportions valid for massive samples with tiny proportions

hypothesis testingproportion;z-test

Let's say I want to run a difference of proportions test where each side has n=23,000 but their proportions are 0.21% and 0.34%.

     group1  group2
n     23000   23000
x        50      78
prop  0.21%   0.34%

both n(p) > 50 & n(1-p) > 50

A standard z-score test will say this difference is significant.

However, my intuition tells me the test should not work for such small proportions. If the true proportions were equal, and with such a rare event, I would actually expect to see large differences like this just from sampling variability. Am I right in thinking this? Does the difference of proportions test break down for tiny proportions?

Note: This is a purely hypothetical question. In real life, I don't care that group2 outperformed group1. The event rate is so low that there is little value in using it. In other words, it is statistically significant but not clinically significant.

Best Answer

Whenever I have doubts about the performance of a particular method, I try to run a simulation study to examine how well the method works under similar conditions. Below is a simple example using R for the case you are describing. Note that I set the true proportions equal for the two groups and to a value that is somewhere in between what you actually observed in the two samples. Therefore, the simulation provides the empirical Type I error rate of the test. It should hopefully be close to .05. Setting the number of iterations large enough will ensure that the simulation error is small. Also, note that I once run the test without and once with Yates' continuity correction to see whether this is relevant here.

iters <- 100000

n <- 23000
p <- 0.0027

x1i <- rbinom(iters, n, p)
x2i <- rbinom(iters, n, p)

pval1 <- rep(NA, iters)
pval2 <- rep(NA, iters)

for (i in 1:iters) {
   pval1[i] <- chisq.test(matrix(c(x1i[i], n-x1i[i], x2i[i], n-x2i[i]), nrow=2, byrow=TRUE), correct=FALSE)$p.value
   pval2[i] <- chisq.test(matrix(c(x1i[i], n-x1i[i], x2i[i], n-x2i[i]), nrow=2, byrow=TRUE), correct=TRUE)$p.value
}

round(mean(pval1 <= .05), 3)
round(mean(pval2 <= .05), 3)

Here are the results from one run:

> round(mean(pval1 <= .05), 3)
[1] 0.05
> round(mean(pval2 <= .05), 3)
[1] 0.04

So, the test performs nominally when not using Yates' continuity correction. With the correction, the test is slightly conservative.

If you want to find out about the power of the test, you can set the true proportions to two different values and then rerun the simulation.

Related Question