Solved – How does fisher.test calculate the confidence interval for the odds ratio in R

confidence intervalfishers-exact-testodds-ratior

The fisher.test function in base R by default returns a confidence interval for the odds ratio in a 2×2 contingency table. For example:

> x <- c(100, 5, 70, 12)
> dim(x) <- c(2,2)
> fisher.test(x)

    Fishers Exact Test for Count Data

data:  x
p-value = 0.02291
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  1.058526 12.904604
sample estimates:
odds ratio 
  3.406113

The confidence interval of an odds ratio is an extremely useful thing to know, and I would like to refer to it in an article I am currently writing. My dataset has high enough n for a chi-square test, but the latter would only give me the test statistic and a p-value, which are harder to interpret than the confidence interval of an odds ratio. However, I cannot find any explanation of how the confidence interval is being calculated in this case, nor of what the theoretical precedent might be for calculating confidence intervals of odds ratios as part of a Fisher test (as opposed to a logistic regression).

Can anyone shed some light?

Best Answer

The R help manual cites the Fisher letter to the Australian Journal of Statistics.

In it he notes, by example:

If the observations in a $2 \times 2$ table are distinctly out of proportion (and indeed in other cases also) we may wish to set limits to the true product ratio, e.g. the observed table

$$ \begin{array}{cc} 10 & 3 \\ 2 & 15 \end{array}$$

gives a crude ratio of 25. How small could the true ratio be in reasonable consistency with the data? If the expectation in the four classes were

$$ \begin{array}{cc} 10-x & 3+x \\ 2+x & 15-x \end{array}$$

the true ratio would be $(10-x)(15-x)/(3+x)(2+x)$m and $\chi^2$ for the observations would be:

$$\chi^2 = x^2 \left( \frac{1}{10-x} + \frac{1}{3+x} + \frac{1}{2+x} + \frac{1}{15-x} \right)$$

so if $x$ were 3.0, $$\chi^2 = 3^2 (0.59286) = 5.3357$$ with one degree of freedom.

The exact probability of such a small sample of 30 giving 10 or more in the first quadrant is the partial sum of a hypergeometric series, and not easy to calculate for if $\xi$ stand for the theoretical product ratio, the frequencies of 0 to 12 in the quadrant will be proportional to the terms:

$$ 1, \frac{13 \times 12}{1\times 6}\xi, \frac{13\times 12 \times 12 \times 11}{1 \times 2 \times 6 \times 7}\xi^2, \ldots, \frac{13!12!5!}{(13-r)!(12-r)!(5+r)!}\xi^i,\ldots$$

It would not be too difficult, as in the exact test for disproportionality, to calcuate the last three terms for any chosen value of $\xi$, but for the ratio of these to the whole we would require the sum of the entire series or $$F(-13, -12, 6, \xi)$$ which would be best obtained by calculating all the terms and summing them, a process too lengthy to be recommended.

Using Yates' adjustment, however, we can at once find: $$\chi^2_c = (2.5)^2 0.59286 = 3.7054$$.

Further taking $x=3.1$ we have

$$ \chi^2_c = (2.6)^2(0.58717) = 3.9693$$

Interpolating for the tabular entry 3.841 it appears that $x=3.0501$ and the cross product ratio is 2.718.

So that it may be inferred from the data that the true cross-product ratio exceeds 2.718 unless a coincidence of one in forty has occurred, Similar limits can be set in both directions and at all limits of probability.

Related Solutions

Confidence Interval – Calculating Odds Ratio and Confidence Interval in Meta-Analysis

I did the following in Stata, the first is fixed effect and the second is random effect. I got different answers than you did.

           Study     |     ES    [95% Conf. Interval]     % Weight
---------------------+---------------------------------------------------
1                    |  2.700       1.800     4.000         63.47
2                    |  1.300       0.500     3.400         36.53
---------------------+---------------------------------------------------
I-V pooled ES        |  2.189       1.312     3.065        100.00
---------------------+---------------------------------------------------
 Heterogeneity calculated by formula
  Q = SIGMA_i{ (1/variance_i)*(effect_i - effect_pooled)^2 } 
where variance_i = ((upper limit - lower limit)/(2*z))^2 



 Heterogeneity chi-squared =   2.27 (d.f. = 1) p = 0.132
  I-squared (variation in ES attributable to heterogeneity) =  56.0%

  Test of ES=0 : z=   4.89 p = 0.000

. metan or ll ul, effect(Odds Ratio) null(1) lcols(trialname) texts(200) random

           Study     |     ES    [95% Conf. Interval]     % Weight
---------------------+---------------------------------------------------
1                    |  2.700       1.800     4.000         55.93
2                    |  1.300       0.500     3.400         44.07
---------------------+---------------------------------------------------
D+L pooled ES        |  2.083       0.721     3.445        100.00
---------------------+---------------------------------------------------
 Heterogeneity calculated by formula
  Q = SIGMA_i{ (1/variance_i)*(effect_i - effect_pooled)^2 } 
where variance_i = ((upper limit - lower limit)/(2*z))^2 

  Heterogeneity chi-squared =   2.27 (d.f. = 1) p = 0.132
  I-squared (variation in ES attributable to heterogeneity) =  56.0%
  Estimate of between-study variance Tau-squared =  0.5488

  Test of ES=0 : z=   3.00 p = 0.003

Solved – How are p-value and odds ratio confidence interval in fisher.test are related

The Fisher's Exact Test does not compute the $p$-value from the odds ratio.
The Fisher's Exact Test tests the independence of a 2x2 table. If a table is independent, the odds ratio is 1, so they are related. The odds ratio is technically a parameter, and a test thereof is logistic regression or Pearson's $\chi^2$-test. Pearson's test and Fisher's Exact Test are asymptotically consistent, meaning they arrive at the same conclusion in the long run.
For odds ratios that have continuously support and their (Wald test) p-values and CIs, it is true that if the 95% CI contains 1, the $p$-value is > 0.05. This is true of the relation between Pearson's $\chi^2$ test and logistic regression: the log odds ratio and its standard error can be used to compute both values arithmetically. Not true of FET.
The Fisher's Exact test considers as a support all permutations of the 2x2 table conditional on the marginal frequencies as you say. For each of those tables an odds ratio can be computed. The support, then, of odds ratios is not continuous.
To obtain 95% CIs and p-values which agree, one must "invert the hypothesis test" so that the 95% CI has only 5% of the possible tables with larger odds ratios in the tails under the null hypothesis. This is the de facto way of computing 95% CIs for odds ratios from Fisher's Exact Test: questions about programming are technically off-topic for this site, but for reporting you should verify this yourself by computing the results in Python, R, and maybe even SAS or Stata.

So yes the two "checks" (tests) are related: $p<0.05$ should be true only when the 95% CI does not include 1. All these results apply for other $\alpha$ levels as well.

Best Answer

Related Solutions

Confidence Interval – Calculating Odds Ratio and Confidence Interval in Meta-Analysis

Solved – How are p-value and odds ratio confidence interval in fisher.test are related

Related Question