Solved – Assumptions for Fisher’s exact test

assumptionscontingency tablesfishers-exact-testinferencepopulation

The 2×2 contingency table below shows the number of judges who have (1) or have not (0) applied a certain law in their rulings. The columns break down these numbers as a function of the judges' level of education: standard or advanced. The hypothesis (H1) was that judges with advanced education would apply said law more frequently.

2x2 contingency table

Computing the odds ratio (OR) as a measure of effect size suggests that, in this sample, a judge with advanced education is 7.74 times more likely to apply that one law, as compared to a judge with standard education. To test the reliability (statistical significance) of the OR statistic, I computed Fisher's "exact" test, whose p-value – unsurprisingly, given the high OR – is very low: p=.000003.

My question: is the inferential statistic (Fisher's test) not invalidated by the sample sizes in the two subgroups being so different across categories (total of 738 judges with standard education vs only 19 for advanced)? Obviously, the numbers would have to be different for the analysis to not be trivial, but the question is, just how different are they allowed to be? Is it not against the test's assumptions to make a population-level inference based on so few subjects in one of the two groups?

I haven't seen it the definition of Fisher's test any assumption/limitation regarding how different the categories are allowed to be in terms of sample size.
Many other statistics employed in hypothesis testing have such assumptions, related to e.g. equal variances or normal distribution, which here might translate to a certain cut-off for the sizes of the subsamples (categories).

(this question has been reposted with clearer&more concise wording)

Best Answer

Fisher's exact test is a common enough choice for analysing tables like yours, but it is not exactly right because of it assumptions about fixed column and row totals. (The lady who tasted tea knew in advance how many cups had milk first and last.) My late friend Ludbrook put it like this:

Fisher's test requires the rare condition that both row and column marginal totals are fixed in advance. The resultant 2 × 2 table is described as doubly conditioned.

I am not aware of any assumption regarding sample size for Fisher's test, but the difference between real-world marginal conditioning and the test assumption would have less importance as the numbers in the table increase. In any case you might consider alternatives mentioned by Ludbrook here and here.