Solved – How to p=1 in Fisher’s exact test

fishers-exact-testrsequence analysis

I'm comparing two datasets from DNA sequencing studies, and comparing mutation rates in genes between the two datasets, which I'm doing using a two-tailed Fisher's exact test (please correct me if I'm wrong in using it in this situation!). I've run the test in R using the fisher.test function, and have included a subset of the data and output below:

Dataset1: n=817

Dataset2: n=18

        MutationsDataset1   MutationsDataset2    p-value
GeneA   282                 1                    0.00975201620794552
GeneB   280                 5                    0.626542416245188
GeneC   62                  4                    0.04683126626377
GeneD   50                  3                    0.100176241063714
GeneE   47                  1                    1
GeneF   42                  1                    0.617780181704477
GeneG   41                  1                    0.608902818182774
GeneH   41                  1                    0.0384567660866955
GeneI   21                  6                    9.12505956956652e-06

My question is, why do I get p=1 for GeneE? Shouldn't a p-value never reach 1 or 0 (only converge on it)? Is this just R rounding up from 0.99999…?

This can be replicated as follows:

df<-data.frame(x=c(47, (817-47)), y=c(1, (18-1)))
fisher.test(df, alternative="two.sided")

The table for GeneE is as follows:

            Dataset1         Dataset2
Mutated     47               1
NotMutated  770              17        

Best Answer

In any randomization test, the probability is the proportion of possible outcomes (given the data but not given the assignment to conditions) as extreme or more extreme than the actual data. If the one in the data is the least extreme, p = 1. It is more of a proportion than a probability in the mathematical sense.

The ratios of dataset 1 to 2 are 47:1 and 45.3:1. That's as close as they can be given the column totals (817 and 18) and the row totals (48 and 787).