Fisher’s Exact Test – Help Understanding the P-Value in Fisher’s Exact Test

contingency tablesfishers-exact-testp-valuer

I am analysing some data using Fisher's exact in R test but I am unsure how to interpret the results.

I want to compare test scores between two groups (A and B), I have two tests but wish to look at each individually, i.e compare scores between group A and B for Test 1, then do the same for Test 2.

As my test scores come under either 0, 1, 2, 3, I am using Fisher's exact test to compare groups A and B in terms of the the distributions of cases among results 0, 1, 2, and 3.

Here is my example data:

exampledata1 <- read.table(text="Unique ID,Test.1,Test.2,Group
544001,3,1,A
668149,0,0,A
314205,0,1,A
646843,1,1,A
599014,0,1,A
502115,1,0,A
123725,2,2,A
170541,3,3,A
181769,1,1,A
928772,0,0,A
448116,0,1,A
287093,0,0,A
754849,1,1,A
862914,0,0,A
390557,0,0,A
749690,0,0,A
423885,1,0,A
831307,0,0,A
642633,1,,A
960592,0,0,A
162882,1,1,A
762726,0,0,A
463306,0,1,A
236706,1,1,A
979328,0,1,A
590783,3,3,A
821588,1,0,A
112601,1,1,A
921085,2,1,A
336733,0,0,A
315681,0,0,A
237933,1,1,A
698807,0,0,A
720256,1,1,A
525437,0,1,A
735806,0,0,A
260575,1,2,A
763688,1,1,A
114882,0,1,A
525912,3,1,A
972047,0,1,A
333651,2,1,A
620004,3,3,B
910745,0,0,B
837662,1,1,B
943322,1,0,B
907396,2,2,B
522464,3,3,B
637028,0,0,B
929821,2,2,B
831377,0,0,B
273030,,,B
978827,1,0,B
725910,0,0,B
519596,3,3,B
484731,1,2,B
344732,3,3,B
604679,1,2,B
213215,0,0,B
595850,0,0,B
286045,3,1,B
192411,2,2,B
747516,1,0,B
729803,1,1,B
266336,1,1,B
527596,,,B
515154,0,1,B
356337,1,1,B
245176,1,1,B
599492,1,1,B
713802,1,1,B
285520,0,0,B
254784,2,1,B
396954,1,1,B
918426,1,1,B
895730,1,1,B
436572,1,1,B
106052,1,2,B
880444,1,1,B
834328,1,1,B
180569,1,1,B
383651,2,1,B
547905,1,1,B
222952,1,1,B", header = TRUE, sep = ",")

This is my code to make a 2*4 contingency table to compare scores for Test.1:

mytable <- table(exampledata1$Group, exampledata1$Test.1)
    addmargins(mytable)

       0  1  2  3 Sum
  A   21 14  3  4  42
  B    8 22  5  5  40
  Sum 29 36  8  9  82

When I run Fisher's exact test I get a significant result:

fisher.test(mytable)

    Fisher's Exact Test for Count Data

data:  mytable
p-value = 0.03869
alternative hypothesis: two.sided

However I am struggling to understand how the scores are significantly different. To me it appears that any significant difference is going to come from the distributions of cases among results 0 and/or 1, however I am not sure where to go from here in terms of analysis. Any help with this would be greatly appreciated.

I hope what I am asking makes sense, please let me know if I can make things clearer.

Best Answer

The table tells you a great deal.

Let's begin with the first rule of data analysis: draw a picture.

par(mfrow=c(1,2))
Test.1 <- table(exampledata1$Group, exampledata1$Test.1)
Test.2 <- table(exampledata1$Group, exampledata1$Test.2)
mosaicplot(Test.1 ~ Group, data=exampledata1, col=c("gray", "tan"))
mosaicplot(Test.2 ~ Group, data=exampledata1, col=c("gray", "tan"))

Figure

These mosaic plots of the tables are self-explanatory: areas represent counts. In the left panel for the first test scores, it is evident group A has relatively more scores of $0$ than group B.

The Fisher test is often used where a $\chi^2$ test would be considered but is suspect because several expected cell values are small (the threshold for "small" is often taken as $5$ or less). The intuition behind $\chi^2$ is that each count is a random variable whose variance approximately equals its expectation. These expectations can be read directly off the table and its row and column sums; for instance, the expectation of test score $0$ for group A is

$$E_{0A} = \frac{42\cdot 29}{82} \approx 14.85$$

where $42$ is the number in group $A$, $29$ is the number scoring $0$ (among both groups), and $82$ is the total number taking the first test.

As in all data analysis, we gain insight by examining the difference between the value and its expectation relative to its standard deviation. In this example, $21$ people in group A scored $0$, so the residual is $21 - 14.85 = 6.15$ and its standard deviation is approximately $\sqrt{14.85} \approx 3.85$. The standardized residual therefore is $6.15 / 3.85 \approx +1.59$.

All eight residuals in the table are similarly computed. (Notice how simple the calculations are: often they can be approximated accurately with mental arithmetic.) In R, it is convenient to obtain them automatically from the result of chisq.test, whether or not you want to use a chi-squared test:

x <- chisq.test(Test.1)
round(residuals(x), 3)

Here's the output:

         0      1      2      3
  A  1.595 -1.034 -0.542 -0.284
  B -1.634  1.059  0.556  0.291

The residuals that are largest in size suggest simple explanations for a significant difference in the table (even when some other test besides $\chi^2$ was used to identify that difference). In this example, those are the residuals of $1.595$ and $-1.634$ for those scoring a zero on the test--exactly as suggested by the mosaic plot. This, together with the sign patterns of the residuals, immediately suggests a verbal description like:

Fisher's Exact Test indicates a significant difference between the groups in Test 1 (p = 0.04). Compared to group A, group B had fewer zero scores and more positive scores.

The reference for evaluating just how large in size a residual might be is the standard Normal distribution. Values larger than $2$ in size are getting extreme and larger than $3$ are definitely interesting. (Just make sure to discount any residuals associated with expected counts much smaller than $5$.) That tells us none of these residuals is extreme: the evidence for a difference therefore is weak. However, the patterns (+--- for group A, -+++ for group B) are meaningful and interpretable, which at the least is suggestive.


Much, much more can be said concerning how to explore and analyze these data, but that would take us far afield. I will end, though, by emphasizing the need to adjust for multiple comparisons when looking at both tests. Although it is comforting that the qualitative pattern discovered in the Test 1 mosaic plot appears to hold for the Test 2 mosaic plot, the latter is not significant (p = $0.25$ with Fisher's Exact Test). A more sophisticated model would be needed to account for the correlations between the two test results.