Logistic Regression – Difference Between Fisher’s Exact Test vs. Logistic Regression for $2 \times 2$ Tables

contingency tablesfishers-exact-testinferencelogisticregression

For a $2 \times 2$ table, two ways to do inference on the table is through Fisher's Exact Test and also a Logistic Regression.

I was told that using a Fisher's Exact Test, we are only interested in the presence of association. But that with a Logistic Regression, we are interested in the magnitude of association.

However, I do not understand why. For example, in a Fisher's exact test done in R, it returns the Odds Ratio with a Confidence Interval, while with the Logistic Regression, we are returned with the intercept and the slope, both of which correspond to the log odds and the log odds ratio.

My question is, where does the Logistic Regression give us the magnitude of association? I am assuming it is in the $\beta_1$ coefficient, but that is just the log odds ratio, which Fisher's exact test yields as well. What are the differences?

Best Answer

I am not sure what the person you've taked to meant with "Logistic Regression give us the magnitude of association" since as you state the fisher's exact test does something quite similar. But still, there are some differences I can think of.

1. The odds ratios (OR) can differ

The OR being reported don't have to be the same. At least this is true for R functions fisher.test() and exact2x2() versus logistic regression via the glm() function. Here an example:

# generating data
set.seed(1)
n <- 200
x <- rbinom(n, 1, .5)
y <- rbinom(n, 1, .4)
df <- data.frame(x, y)

# OR from logistic regression
exp(coef(glm(y ~ x,family=binomial(link='logit'),data= df)))[2]
1.423077

# OR from fisher's exact test
tab <- table(x, y)
fisher.test(tab)$estimate
1.420543 # the methods "minlike", "central" and "blaker" in the exact2x2 function result in the same OR

# calculating OR by hand
(tab[1,1]/ tab[2,1])/ (tab[1,2]/ tab[2,2])
1.423077

The OR of fisher's exact test differs from those calculated by hand or reported in logistic regression because they are calculated by the conditional Maximum Likelihood Estimate and not by the unconditional MLE (sample OR). There may be situations where the OR values differ more than in my example. And again, the OR differ for the functions mentioned but there may be other variants of the tests were they are the same.

2. p values differ

Of course the p values differ since in case of logistic regression they are determined with the Wald statistic and a z value while there are different types of exact fisher's test that even differ in p values among themselves (last link opens pdf). See here for the data used before:

# p value from logistic regression
summary(glm(y ~ x,family=binomial(link='logit'),data= df))$coefficients["x", "Pr(>|z|)"]
0.2457947

# p value from fisher's exact test
library(exact2x2) # package covers different exact fisher's tests, see here https://cran.r-project.org/web/packages/exact2x2/index.html

exact2x2(tab,tsmethod="central")$p.value
0.3116818
exact2x2(tab,tsmethod="minlike")$p.value
0.290994 # which is same as fisher.test(tab)$p.value and exact2x2(tab,tsmethod="blaker")$p.value

Here in all cases one would conclude that there is no significant effect. But still, as you can see the differences are not trivial (.246 for logistic regression versus .291 or even .312 for exact fisher's test). Thus depending on whether you are using logistic regression or fisher's exact test you may come to an other conclusion wether there is a significant effect or not.

3. Making a prediction

To make an analogy: Pearson correlation and linear regression are quite similar in bivariate cases and the standardised regression coefficient is even the same as Pearson's correlation r. But you can't make predictions with a correlation since it is missing an intercept. Similarly, even if odds ratios of logistic regression and fisher's exact test were the same (what is not the case as discussed in point 1) you couldn't make predictions with the results of the fisher's exact test. On the other hand, logistic regression provides you the intercept and the coefficient(s) that are needed to make predictions.

4. Performance

The differences mentioned before can lead to the assumption that there should be differences in the performance of both tests in terms of power and type I error. There are some sources stating that fisher's exact test is too conservarive. On the other hand, one should keep in mind that the standard logistic regression analyses is asymptotic, so with few observations you will probably prefer fisher's exact test.

To sum up, although both tests can be used for same data, there are some differences that can lead to different results and thus to different conclusions. So it depends on the situation which of the two tests you want to use - in case of prediction it would be the logistic regression, in case of small sample sizes the fisher's exact test, and so on. Probably there are even more differences which I left out but maybe someone can edit and add them.

Related Solutions

Solved – Computing Fisher’s Exact Test for rxc tables

Ok, the problem was that r2dtable generates random tables with replacement so when I was trying to test 1000 tables I was really testing about 60 unique tables. That's why such a gap was observed I think.

So the next question is how to make sure to have 1000 unique tables without having to generate 1e8 tables and filter out duplicates ?

Thanks for your help

Solved – Fisher’s exact test or logistic regression to determine if distributions differ

A chi-squared test will be simplest and most appropriate. Fisher's exact test tests for differences conditional on fixed margins, which is almost certainly inappropriate here. Logistic regression would be fine, but chi-squared would be simpler; also, LR is really assessing smoking as a function of your groups, which does not quite conceptually match your real question.

d = read.table(text="Groups     Yes   No  All
                    ']10,20]'    35    6   41
                    ']20,30]'    20   13   33
                    ']30,40]'    10   15   25
                    ']40,50]'    15    9   24", header=T)

tab = as.table(as.matrix(d[,-c(1,4)]))
names(dimnames(tab)) = c("Groups", "Smoker")
rownames(tab)        = d[,1]
colnames(tab)        = names(d)[2:3]

chisq.test(tab)
#   Pearson's Chi-squared test
# 
# data:  tab
# X-squared = 14.697, df = 3, p-value = 0.002095

Let me add a couple additional notes: The chi-squared test only gives you a p-value for the null that the distributions are the same. You may want to characterize how they differ. A couple ways to do this would be to make a table of column-wise proportions, and to make a mosaic plot:

round(prop.table(tab, 2), 3)
#          Smoker
# Groups      Yes    No
#   ]10,20] 0.438 0.140
#   ]20,30] 0.250 0.302
#   ]30,40] 0.125 0.349
#   ]40,50] 0.188 0.209
windows()
  mosaicplot(t(tab), shade=T)

From these, you can see that there are 'too few' non-smokers in the ]10,20] group, and 'too many' in the ]30,40] group.

If you wanted to use a logistic regression to test these data. It is very simple:

mod = glm(cbind(Yes, No)~Groups, d, family=binomial)
summary(mod)
# Call:
# glm(formula = cbind(Yes, No) ~ Groups, family = binomial, data = d)
# 
# Deviance Residuals: 
# [1]  0  0  0  0
# 
# Coefficients:
#               Estimate Std. Error z value Pr(>|z|)    
# (Intercept)     1.7636     0.4419   3.991 6.57e-05 ***
# Groups]20,30]  -1.3328     0.5676  -2.348 0.018866 *  
# Groups]30,40]  -2.1691     0.6016  -3.606 0.000311 ***
# Groups]40,50]  -1.2528     0.6108  -2.051 0.040249 *  
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance: 1.5415e+01  on 3  degrees of freedom
# Residual deviance: 2.4425e-15  on 0  degrees of freedom
# AIC: 22.657
# 
# Number of Fisher Scoring iterations: 3

The test of Groups is not the same as any of the individual tests displayed in the summary output. With only one variable in the model, the test of the variable as a whole is the same as the test of the model as a whole. Unfortunately, R does not give you that by default here as it does for a linear model. You can use the null and residual deviances and degrees of freedom to get a likelihood ratio test, though:

1-pchisq(mod$null.deviance-deviance(mod), df=mod$df.null-mod$df.residual)
# [1] 0.001494036

With only one variable in the model, a convenient way to get the test of the model as a whole is to use anova.glm(). By setting test="LRT", you get the same as the manual method above, and using test="Rao", you get the score test, which is the same as the chi-squared at the top:

anova(mod, test="LRT")
# Analysis of Deviance Table
# Model: binomial, link: logit
# Response: cbind(Yes, No)
# Terms added sequentially (first to last)
# 
#        Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
# NULL                       3     15.415            
# Groups  3   15.415         0      0.000 0.001494 **
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova(mod, test="Rao")
# Analysis of Deviance Table
# Model: binomial, link: logit
# Response: cbind(Yes, No)
# Terms added sequentially (first to last)
# 
#        Df Deviance Resid. Df Resid. Dev    Rao Pr(>Chi)   
# NULL                       3     15.415                   
# Groups  3   15.415         0      0.000 14.697 0.002095 **
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Lastly, the naming of your rows (i.e., the Groups) is suspicious. Is this the result of categorizing an originally continuous variable? If so, that is very much not recommended. The categorization can arbitrarily create the appearance of different distributions where they don't actually exist. To get a sense of this, it may help to read this excellent answer (although the context differs): Assessing approximate distribution of data based on a histogram. You would do much better to use the original continuous values and decide a-priori what kind of difference in the distributions you might care to detect (mean shift, difference in spread, skew, heavy-tailedness, other tail behavior, multi-modality, etc.), and test for that explicitly.

Best Answer

Related Solutions

Solved – Computing Fisher’s Exact Test for rxc tables

Solved – Fisher’s exact test or logistic regression to determine if distributions differ

Related Question