Whenever I have doubts about the performance of a particular method, I try to run a simulation study to examine how well the method works under similar conditions. Below is a simple example using R for the case you are describing. Note that I set the true proportions equal for the two groups and to a value that is somewhere in between what you actually observed in the two samples. Therefore, the simulation provides the empirical Type I error rate of the test. It should hopefully be close to .05. Setting the number of iterations large enough will ensure that the simulation error is small. Also, note that I once run the test without and once with Yates' continuity correction to see whether this is relevant here.
iters <- 100000
n <- 23000
p <- 0.0027
x1i <- rbinom(iters, n, p)
x2i <- rbinom(iters, n, p)
pval1 <- rep(NA, iters)
pval2 <- rep(NA, iters)
for (i in 1:iters) {
pval1[i] <- chisq.test(matrix(c(x1i[i], n-x1i[i], x2i[i], n-x2i[i]), nrow=2, byrow=TRUE), correct=FALSE)$p.value
pval2[i] <- chisq.test(matrix(c(x1i[i], n-x1i[i], x2i[i], n-x2i[i]), nrow=2, byrow=TRUE), correct=TRUE)$p.value
}
round(mean(pval1 <= .05), 3)
round(mean(pval2 <= .05), 3)
Here are the results from one run:
> round(mean(pval1 <= .05), 3)
[1] 0.05
> round(mean(pval2 <= .05), 3)
[1] 0.04
So, the test performs nominally when not using Yates' continuity correction. With the correction, the test is slightly conservative.
If you want to find out about the power of the test, you can set the true proportions to two different values and then rerun the simulation.
Best Answer
The standard formula for testing equality of 2 proportions (using the normal approximation) uses a pooled estimate of the proportion that is appropriate when the null of equal proportions is true. In your case the proportions are not equal, so the pooled proportion is not appropriate.
One option is that you can code the formula that does not pool the proportion, then compute the p-value, etc. from the normal approximation.
Another option is to just use
prop.test
but ignore the p-value part and look to see if the confidence interval includes the C value that you are interested in. If C is not in the interval then that is equivalent to rejecting the null and if C is in the interval then that is equivalent of a p-value greater than alpha (not enough evidence to reject). You don't get an exact p-value, but you get the same decision.