Solved – Why is R automatically using Yates’ correction

ryates-correction

I wanted to understand why R is always running Yates' correction for a 2$\times$2 contingency table. My strategy is to create some random 2$\times$2 contingency tables (let's say a thousand) and do X² test with or without Yates' correction and maybe a Fisher test to see which test is closer to the theoretical number of p.value < 0.05 which should be if I'm not wrong 5%.

Here are the results:

  • Without Yates' correction : 52 pvalues < 0.05 /1000 runs
  • X² with Yates's correction: 33 pvalues < 0.05
  • Fisher : 40 pvalues < 0.05

So sadly I can not conclude since it is not helping me to understand why Yates' correction is automatically used by R since my results show that this correction seems to be over conservative (by the way, I am surprised that the X² seems more accurate than Fisher.test)

Being a novice in both statistics and programming, I ignore if my strategy is relevant.

Here is the script :

# Value choice
nb.Runs = 1000    # number of try
effectif = 100    

Yateslist <- 0
Chilist <- 0
Fisherlist <- 0

for (i in 1:nb.Runs){
sujets = matrix((rnorm(effectif*2, mean=0, sd=1)<0), effectif, ncol=2)
conting = table(as.data.frame(sujets))

Yates.pval=chisq.test(conting)$p.value
Chi.pval=chisq.test(conting, correct=FALSE)$p.value
Fisher.pval=fisher.test(conting)$p.value

if(Yates.pval <= 0.05){Yateslist = Yateslist + 1}
if(Chi.pval <= 0.05){Chilist = Chilist + 1}
if(Fisher.pval <= 0.05){Fisherlist = Fisherlist + 1}
}

# Results
cat("Nombre de significatifs :", "\n")
cat("Yates", Yateslist, "\n")
cat("Sans corr", Chilist, "\n")
cat("Fisher", Fisherlist, "\n")
cat("sur", nb.Runs, "essais", "\n")

Best Answer

I don't think this question will have an answer for "why" since that would be speculating about the R programmer's intentions. Therefore, I don't think it's a relevant question to ask. The source code for the chisq.test function does not reveal code comments, and the function documentation also doesn't give an indication. The only relevant reference is to Alan Agresti's book, which may have suggested the use of a Yates' correction in place of no continuity correction.