Solved – Differences in Fisher exact test statistic

chi-squared-testfishers-exact-test

I am trying to perform Fisher test on the following matrix:

269 118
 55  48

Several free websites including R fisher.test() are returning p=0.0033, while vassarstats.net returns the output from the image.

Am I misinterpreting the data? Why the difference?

new image

Best Answer

It is possible to perform the Fisher exact test manually in R to see which result is correct. To do this, we need to calculate the probabilities of each possible outcome of a contingency table with the same row and column totals as your table. (For computational reasons discussed here I will use log-probabilities instead.) The two-sided version of Fisher's exact test calculates the p-value as the sum of all the probabilities no greater than the probability of the observed contingency table.

#Input the observed contingency table and set parameters
DATA <- matrix(c(269, 118, 55, 48), nrow = 2);
m    <- sum(DATA[1, 1:2]);
n    <- sum(DATA[2, 1:2]);
k    <- sum(DATA[1:2, 1]);
maxx <- sum(DATA[1:2, 2]);

#Calculate log-probabilities over all possible outcomes
LOGPROBS <- rep(0, maxx+1);
for (x in 0:maxx) { 
  LOGPROBS[x+1] <- dhyper(DATA[1,1] - DATA[2,2] + x, m, n, k, log = TRUE); }

#Calculate p-value for Fisher exact test
LOWER   <- LOGPROBS[which(LOGPROBS <= LOGPROBS[DATA[2,2]+1])];
P_VALUE <- exp(matrixStats::logSumExp(LOWER));

P_VALUE;
[1] 0.003255474

This manual calculation of the test gives the same p-value as the fisher.test function in R. I have also checked that the log-probabilities given here yield a total probability of one over all possibilities (to within a very small tolerance). So, based on this investigation, it appears to me that the calculation in R is correct, which suggests that there is some issue with the calculation at the website resource. (I suggest reading the documentation carefully to see if they are using some approximation method in their calculation.) Note that one possible source of calculation error is from arithmetic underflow problems if you try to compute the Fisher exact p-value without converting to log-probability space.

Related Question