Solved – Yates continuity correction for 2 x 2 contingency tables

categorical datachi-squared-testyates-correction

I would like to gather input from people in the field about the Yates continuity correction for 2 x 2 contingency tables. The Wikipedia article mentions it may adjust too far, and is thus only used in a limited sense. The related post here doesn't offer much further insight.

So to the people who use these tests regularly, what are your thoughts? Is it better to use the correction or not?

And a real world example which would yield different results at the 95% confidence level. Note this was a homework problem, but our class does not deal with the Yates continuity correction at all, so sleep easy knowing you aren't doing my homework for me.

samp <- matrix(c(13, 12, 15, 3), byrow = TRUE, ncol = 2)
colnames(samp) <- c("No", "Yes")
rownames(samp) <- c("Female", "Male")

chisq.test(samp, correct = TRUE)
chisq.test(samp, correct = FALSE)    

Best Answer

Yates' correction results in tests that are more conservative as with Fisher's "exact" tests.

Here is an online tutorial on the use of Yates’s continuity correction, by Stefanescu et al, which clearly points to various flaws of systematic correction for continuity (pp. 4-6). Quoting Agresti (CDA 2002), "Yates (1934) mentioned that Fisher suggested the hypergeometric to him for an exact test", which led to the continuity-corrected version of the $\chi^2$. Agresti also indicated that Fisher's test is a good alternative now that computers can do it even for large samples (p. 103). Now, the point is that choosing a test really depends on the question that is asked and the assumptions that are made by each of them (e.g., in the case of the Fisher's test we assume that margins are fixed).

In your case, Fisher test and corrected $\chi^2$ agree and yield $p$-value above 5%. In the case of the ordinary $\chi^2$, if $p$-values are computed using a Monte Carlo approach (see simulate.p.value), then it fails to reach significance too.

Other useful references dealing with small sample size issues and the overuse of Fisher's test, include: