Logistic Regression – Difference Between Fisher’s Exact Test vs. Logistic Regression for $2 \times 2$ Tables

contingency tablesfishers-exact-testinferencelogisticregression

For a $2 \times 2$ table, two ways to do inference on the table is through Fisher's Exact Test and also a Logistic Regression.

I was told that using a Fisher's Exact Test, we are only interested in the presence of association. But that with a Logistic Regression, we are interested in the magnitude of association.

However, I do not understand why. For example, in a Fisher's exact test done in R, it returns the Odds Ratio with a Confidence Interval, while with the Logistic Regression, we are returned with the intercept and the slope, both of which correspond to the log odds and the log odds ratio.

My question is, where does the Logistic Regression give us the magnitude of association? I am assuming it is in the $\beta_1$ coefficient, but that is just the log odds ratio, which Fisher's exact test yields as well. What are the differences?

Best Answer

I am not sure what the person you've taked to meant with "Logistic Regression give us the magnitude of association" since as you state the fisher's exact test does something quite similar. But still, there are some differences I can think of.

1. The odds ratios (OR) can differ

The OR being reported don't have to be the same. At least this is true for R functions fisher.test() and exact2x2() versus logistic regression via the glm() function. Here an example:

# generating data
set.seed(1)
n <- 200
x <- rbinom(n, 1, .5)
y <- rbinom(n, 1, .4)
df <- data.frame(x, y)

# OR from logistic regression
exp(coef(glm(y ~ x,family=binomial(link='logit'),data= df)))[2]
1.423077

# OR from fisher's exact test
tab <- table(x, y)
fisher.test(tab)$estimate
1.420543 # the methods "minlike", "central" and "blaker" in the exact2x2 function result in the same OR

# calculating OR by hand
(tab[1,1]/ tab[2,1])/ (tab[1,2]/ tab[2,2])
1.423077

The OR of fisher's exact test differs from those calculated by hand or reported in logistic regression because they are calculated by the conditional Maximum Likelihood Estimate and not by the unconditional MLE (sample OR). There may be situations where the OR values differ more than in my example. And again, the OR differ for the functions mentioned but there may be other variants of the tests were they are the same.

2. p values differ

Of course the p values differ since in case of logistic regression they are determined with the Wald statistic and a z value while there are different types of exact fisher's test that even differ in p values among themselves (last link opens pdf). See here for the data used before:

# p value from logistic regression
summary(glm(y ~ x,family=binomial(link='logit'),data= df))$coefficients["x", "Pr(>|z|)"]
0.2457947

# p value from fisher's exact test
library(exact2x2) # package covers different exact fisher's tests, see here https://cran.r-project.org/web/packages/exact2x2/index.html

exact2x2(tab,tsmethod="central")$p.value
0.3116818
exact2x2(tab,tsmethod="minlike")$p.value
0.290994 # which is same as fisher.test(tab)$p.value and exact2x2(tab,tsmethod="blaker")$p.value

Here in all cases one would conclude that there is no significant effect. But still, as you can see the differences are not trivial (.246 for logistic regression versus .291 or even .312 for exact fisher's test). Thus depending on whether you are using logistic regression or fisher's exact test you may come to an other conclusion wether there is a significant effect or not.

3. Making a prediction

To make an analogy: Pearson correlation and linear regression are quite similar in bivariate cases and the standardised regression coefficient is even the same as Pearson's correlation r. But you can't make predictions with a correlation since it is missing an intercept. Similarly, even if odds ratios of logistic regression and fisher's exact test were the same (what is not the case as discussed in point 1) you couldn't make predictions with the results of the fisher's exact test. On the other hand, logistic regression provides you the intercept and the coefficient(s) that are needed to make predictions.

4. Performance

The differences mentioned before can lead to the assumption that there should be differences in the performance of both tests in terms of power and type I error. There are some sources stating that fisher's exact test is too conservarive. On the other hand, one should keep in mind that the standard logistic regression analyses is asymptotic, so with few observations you will probably prefer fisher's exact test.

To sum up, although both tests can be used for same data, there are some differences that can lead to different results and thus to different conclusions. So it depends on the situation which of the two tests you want to use - in case of prediction it would be the logistic regression, in case of small sample sizes the fisher's exact test, and so on. Probably there are even more differences which I left out but maybe someone can edit and add them.