Solved – G-test vs Pearson’s chi-squared test

chi-squared-distributionchi-squared-testcontingency tablesmonte carlop-value

I'm testing independence in an $N \times M$ contingency table. I don't know whether the G-test or Pearson's chi-squared test is better. The sample size is in the hundreds but there are some low cell counts. As stated on the Wikipedia page, the approximation to the chi-squared distribution is better for the G-test than for Pearson's chi-squared test. But I'm using Monte Carlo simulation to compute the p-value, so is there any difference between these two tests?

Best Answer

They are asymptotically the same. They are just different ways of getting at the same idea. More specifically, Pearson's chi-squared test is a score test, whereas the G-test is a likelihood ratio test. To get a better sense of those ideas, it may help you to read my answer here: Why do my p-values differ between logistic regression output, chi-squared test, and the confidence interval for the OR? To answer your direct question, if you are computing the p-value by Monte Carlo simulation, it shouldn't matter; you could just use whichever is more convenient for you. Note that there is no problem with low cell counts, only (potentially) low expected cell counts; it is possible to have low cell counts and have expected counts that are just fine. Furthermore, neither low actual counts nor low expected counts matters when the p-value is determined by simulation.

(For what it's worth, I would probably use Pearson's chi-squared, because R has a convenient function for that which includes the option of simulating the p-value.)