I'm running a chi-squared test in R, for a 2×2 contingency table. When I simulate the p-values using a Monte Carlo simulation, it outputs the degrees freedom for the test as 'NA' (but not when I run the test without the simulation).
Why does that happen, and what should I report for the df in this case?
The code:
cont_table <- matrix(
c(0, 1000, 20, 1020),
nrow=2, ncol=2
)
print(chisq.test(cont_table, simulate.p.value = FALSE, correct = FALSE))
print(chisq.test(cont_table, simulate.p.value = TRUE, correct = FALSE))
The output:
> print(chisq.test(cont_table, simulate.p.value = FALSE, correct = FALSE))
Pearson's Chi-squared test
data: cont_table
X-squared = 19.421, df = 1, p-value = 1.048e-05
> print(chisq.test(cont_table, simulate.p.value = TRUE, correct = FALSE))
Pearson's Chi-squared test with simulated p-value (based on 2000 replicates)
data: cont_table
X-squared = 19.421, df = NA, p-value = 0.0004998
Best Answer
This is by design, you can refer to
?chisq.test
:If you ask why is is to, then the answer is pretty simple. If you use standard $\chi^2$ test, then you compare your test statistic to theoretical test distribution (in this case $\chi^2$ with a given degrees of freedom). On another hand, when applying the Monte Carlo simulation (using Hope [1968] method, as described in the documentation), you simulate the data from the null distribution and then check how often the result as extreme as yours has appeared under the null distribution. In this case, the null distribution is generated by simulating "random" contingency tables with marginals as in your data. Obviously, in case of Monte Carlo simulation, degrees of freedom do not go into equation at any stage of the computation, so there is no reason to report them.
Check the referred paper
plus the source code of routines
C_chisq_sim
andrcont2
that is used by it, to learn more.