Machine Learning – Why Exact Tests Are Preferred Over Chi-Squared for Small Sample Sizes

chi-squared-testdistributionsmachine learningmathematical-statisticsstatistical significance

I am aware that tests such as Fisher's exact test are sometimes preferable to chi-squared if your expected values are low in a contingency table, when looking to test homogeneity of groups (historically people have suggested 5 although some seem to think this is conservative).

I can't however seem to find an explanation of why chi-squared does not work well for small sample sizes. I therefore have 2 questions:

  1. What causes expected values in a contingency table to become small as sample size reduces?(I am assuming here the small expected values are a result of the small sample size).
  2. Why is it that the chi-squared test should not be used for small sample sizes? I have seen people say it does not adequately approximate the theoretical chi-squared distribution but can someone explain why/how it doesn't?

Best Answer

In a classical hypothesis test, you have a test statistic that orders the evidence from that which is most conducive to the null hypothesis to that which is most conducive to the alternative hypothesis. (Without loss of generality, suppose that a higher value of this statistic is more conducive to the alternative hypothesis.) The p-value of the test is the probability of observing evidence at least as conducive to the alternative hypothesis as what you actually observed (a test statistic at least as large as the observed value) under the assumption that the null hypothesis is true. This is computed from the null distribution of the test statistic, which is its distribution under the assumption that the null hypothesis is true.

Now, an "exact test" is a test that computes the p-value exactly ---i.e., it computes this from the true null distribution of the test statistic. In many statistical tests, the true null distribution is complicated, but it can be approximated by another distribution, and it converges to that approximating distribution as $n \rightarrow \infty$. In particular, the so-called "chi-squared tests" are hypothesis tests where the true null distribution converges to a chi-squared distribution.

So, in a "chi-squared test" of this kind, when you compute the p-value of the test using the chi-squared distribution, this is just an approximation to the true p-value. The true p-value of the test is given by the exact test, and you are approximating this value using the approximating null distribution of the test statistic. When $n$ is large this approximation is very good, but when $n$ is small the approximation may be poor. For this reason, statisticians counsel against using the "chi-squared tests" (i.e., using the chi-squared approximation to the true null distribution) when $n$ is small.


Chi-squared tests for independence in contingency tables: Now I will examine your specific questions in relation to chi-squared tests for testing independence in contingency tables. In this context, if we have a contingency table with observed counts $O_1,...,O_K$ summing to $n \equiv \sum O_i$ then the test statistic is the Pearson statistic:

$$\chi^2 = \sum_{i=1}^K \frac{(O_i-E_i)^2}{E_i},$$

where $E_1,...,E_K$ are the expected cell values under the null hypothesis.$^\dagger$ The first thing to note here is that the observed counts $O_1,...,O_K$ are non-negative integers. For any $n<\infty$ this limits the possible values of the test statistic to a finite set of possible values, so its true null distribution will be a discrete distribution on this finite set of values. Note that the chi-squared distribution cannot be the true null distribution because it is a continuous distribution over all non-negative real numbers --- an (uncountable) infinite set of values.

As in other "chi-squared tests" the null distribution of the test statistic here is well approximated by the chi-squared distribution when $n$ is large. You are not correct to say that this is a matter of failing to "adequately approximate the theoretical chi-squared distribution" --- on the contrary, the theoretical chi-squared distribution is the approximation, not the true null distribution. The chi-squared approximation is good so long as none of the values $E_1,...,E_K$ is small. The reason that these expected values are small for low values of $n$ is that when you have a low total count value, you must expect the counts in at least some cells to be low.


$^\dagger$ For analysis of contingency tables, these expected cell counts are obtained by conditioning on the marginal totals under the null hypothesis of independence. It is not necessary for us to go into any further detail on these values.