Solved – Why are the degrees freedom not available when I conduct a Monte Carlo simulation with a chi-square test

chi-squared-testdegrees of freedommonte carlor

I'm running a chi-squared test in R, for a 2×2 contingency table. When I simulate the p-values using a Monte Carlo simulation, it outputs the degrees freedom for the test as 'NA' (but not when I run the test without the simulation).

Why does that happen, and what should I report for the df in this case?

The code:

cont_table <- matrix(
  c(0, 1000, 20, 1020),
  nrow=2, ncol=2
)
print(chisq.test(cont_table, simulate.p.value = FALSE, correct = FALSE))
print(chisq.test(cont_table, simulate.p.value = TRUE, correct = FALSE))

The output:

> print(chisq.test(cont_table, simulate.p.value = FALSE, correct = FALSE))

    Pearson's Chi-squared test

data:  cont_table
X-squared = 19.421, df = 1, p-value = 1.048e-05

> print(chisq.test(cont_table, simulate.p.value = TRUE, correct = FALSE))

    Pearson's Chi-squared test with simulated p-value (based on 2000 replicates)

data:  cont_table
X-squared = 19.421, df = NA, p-value = 0.0004998

Best Answer

This is by design, you can refer to ?chisq.test:

the degrees of freedom of the approximate chi-squared distribution of the test statistic, NA if the p-value is computed by Monte Carlo simulation.

If you ask why is is to, then the answer is pretty simple. If you use standard $\chi^2$ test, then you compare your test statistic to theoretical test distribution (in this case $\chi^2$ with a given degrees of freedom). On another hand, when applying the Monte Carlo simulation (using Hope [1968] method, as described in the documentation), you simulate the data from the null distribution and then check how often the result as extreme as yours has appeared under the null distribution. In this case, the null distribution is generated by simulating "random" contingency tables with marginals as in your data. Obviously, in case of Monte Carlo simulation, degrees of freedom do not go into equation at any stage of the computation, so there is no reason to report them.

Check the referred paper

Hope, A. C. A. (1968) A simplified Monte Carlo significance test procedure. J. Roy, Statist. Soc. B 30, 582–598.

plus the source code of routines C_chisq_sim and rcont2 that is used by it, to learn more.