P-Value and Confidence Interval Disagreement in Two Sample Test of Proportions – Analysis

chi-squared-testconfidence intervalp-valueproportion;r

I'm using R to calculate the two-sample test for equality of proportions, where the two proportions are 350/400 and 25/25. So:

> prop.test(c(350,25),c(400,25))                                                                                                                                                           

        2-sample test for equality of proportions with continuity correction

data:  c(350, 25) out of c(400, 25) 
X-squared = 2.4399, df = 1, p-value = 0.1183
alternative hypothesis: two.sided 
95 percent confidence interval:
 -0.17865986 -0.07134014 
sample estimates:
prop 1 prop 2 
 0.875  1.000 

Warning message:
In prop.test(c(350, 25), c(400, 25), correct = FALSE) :
  Chi-squared approximation may be incorrect

What I can't reconcile on my own is that the p-value is greater than 0.05, and yet the 95% confidence interval for the difference does not include 0. I thought there was an 'if and only if' relationship between the two (The p-value < alpha iff the (1-alpha) confidence interval of the difference does not include 0).

What am I not seeing? My only guess is there's something fundamental that I'm misunderstanding, or that it has something to do with that warning message about chi-squared approximation.

Best Answer

I presume they result from two somewhat different approximations in this instance.

For the ordinary chi-square test, the interval that corresponds to the chi-square is the Wilson score interval

$$\frac{1}{1 + \frac{1}{n} z_{1 - \frac{1}{2}\alpha}^2} \left[ \hat p + \frac{1}{2n} z_{1 - \frac{1}{2}\alpha}^2 \pm z_{1 - \frac{1}{2}\alpha} \sqrt{ \frac{1}{n}\hat p \left(1 - \hat p\right) + \frac{1}{4n^2}z_{1 - \frac{1}{2}\alpha}^2 } \right]$$

Looking into the code (just type prop.test to see the code for it), it looks like you get the Wilson score interval by default, but with a continuity correction applied to $p$.

[Note that one of the references in the help (?prop.test) discusses eleven different confidence intervals for the difference in proportions; at most one will always exactly correspond to any given form of the hypothesis test.]

While the without-continuity-correction Wilson score interval will correspond to the without-continuity-correction chi-square, my guess is that the continuity-corrected version of both that is being used no longer correspond exactly.

I guess the way to get an interval that should correspond would be to write the interval corresponding to the continuity-corrected chi-squared in similar fashion to the way the Wilson score interval is derived (see the above Wikipedia link) and solve that for the endpoints.

Related Question