Solved – Is it inappropriate to use Fisher’s exact test when cell counts are high

contingency tablesfishers-exact-test

I have not been able to find an authoritative source on when it is inappropriate to use Fisher's Exact test in the context of cell counts. I know that it can be computationally intensive to use Fisher Exact test's if the cell counts are large, but if I am not concerned with computation time, is it still statistically valid to use Fisher Exact's test on a 2×2 contingency table with large cell counts?

Please let me know if I can clarify the question.

EDIT:
Some links that imply there is an issue with large cell counts for Fisher's Exact test:

  1. https://graphpad.com/quickcalcs/contingency1/

    • This calculator from graphpad will not allow you to use Fisher's Exact test when cell counts are large –> it uses Chi Squared test instead.
  2. https://stackoverflow.com/questions/30472087/r-is-fishers-exact-test-with-large-numbers-still-accurate?rq=1

    • This thread where a comment states "Why on earth would you do a fisher's exact test for this? That test is better suited for small counts."

That's all I have for now.

Best Answer

Strictly speaking, Fisher's exact test computes the proportion of possible contingency tables (number of combinations of cell counts) conditional on the marginals that is as extreme or more extreme than your contingency table. When the counts are large, this leads to a combinatorial explosion. Computers really are pretty fast these days, but they can still be overwhelmed. Here is an example, coded in R:

fisher.test(matrix(c(19874393874932817,943850439754375437,
                     19743832745983274,981749374918734987), nrow=2))
# Error in fisher.test(matrix(c(19874393874932816, 943850439754375424, 19743832745983272,  : 
#   'x' has entries too large to be integer
chisq.test(matrix(c(19874393874932817,943850439754375437,
                    19743832745983274,981749374918734987), nrow=2))
#   Pearson's Chi-squared test with Yates' continuity correction
# 
# data:  matrix(c(19874393874932816, 943850439754375424, 
#                 19743832745983272, 981749374918734976), nrow = 2)
# X-squared = 2.0502e+13, df = 1, p-value < 2.2e-16

So there is nothing really invalid about using Fisher's exact test in these cases, it's just that the computations are intractable.