Exact Confidence Interval for Relative Risk – Calculation Guide

confidence intervalepidemiologyrelative-risk

I am working on some MRSA data and need to calculate the relative risk of a group of hospitals compared with the remaining hospital.

My colleagues throws me an excel with a formula inside to calculate the "exact confidence interval of relative risk", I can do the calculation without difficulties, but I have no idea on how and why this formula is used for do such calculation.

I have attached the excel file here for your reference.

Can anyone show me a reference on the rationale of the calculation? Article from textbooks will be fine to me. Thanks!

Best Answer

Check out the R Epi and epitools packages, which include many functions for computing exact and approximate CIs/p-values for various measures of association found in epidemiological studies, including relative risk (RR). I know there is also PropCIs, but I never tried it. Bootstraping is also an option, but generally these are exact or approximated CIs that are provided in epidemiological papers, although most of the explanatory studies rely on GLM, and thus make use of odds-ratio (OR) instead of RR (although, wrongly it is often the RR that is interpreted because it is easier to understand, but this is another story).

You can also check your results with online calculator, like on statpages.org, or Relative Risk and Risk Difference Confidence Intervals. The latter explains how computations are done.

By "exact" tests, we generally mean tests/CIs not relying on an asymptotic distribution, like the chi-square or standard normal; e.g. in the case of an RR, an 95% CI may be approximated as $\exp\left[ \log(\text{rr}) - 1.96\sqrt{\text{Var}\big(\log(\text{rr})\big)} \right], \exp\left[ \log(\text{rr}) + 1.96\sqrt{\text{Var}\big(\log(\text{rr})\big)} \right]$, where $\text{Var}\big(\log(\text{rr})\big)=1/a - 1/(a+b) + 1/c - 1/(c+d)$ (assuming a 2-way cross-classification table, with $a$, $b$, $c$, and $d$ denoting cell frequencies). The explanations given by @Keith are, however, very insightful.

For more details on the calculation of CIs in epidemiology, I would suggest to look at Rothman and Greenland's textbook, Modern Epidemiology (now in it's 3rd edition), Statistical Methods for Rates and Proportions, from Fleiss et al., or Statistical analyses of the relative risk, from J.J. Gart (1979).

You will generally get similar results with fisher.test(), as pointed by @gd047, although in this case this function will provide you with a 95% CI for the odds-ratio (which in the case of a disease with low prevalence will be very close to the RR).

Notes:

I didn't check your Excel file, for the reason advocated by @csgillespie.
Michael E Dewey provides an interesting summary of confidence intervals for risk ratios, from a digest of posts on the R mailing-list.

Related Solutions

Relative Risk – How to Calculate Confidence Interval for Relative Risk

The three options that are proposed in riskratio() refer to an asymptotic or large sample approach, an approximation for small sample, a resampling approach (asymptotic bootstrap, i.e. not based on percentile or bias-corrected). The former is described in Rothman's book (as referenced in the online help), chap. 14, pp. 241-244. The latter is relatively trivial so I will skip it. The small sample approach is just an adjustment on the calculation of the estimated relative risk.

If we consider the following table of counts for subjects cross-classififed according to their exposure and disease status,

          Exposed  Non-exposed  Total
Cases          a1           a0     m1
Non-case       b1           b0     m0
Total          n1           n0      N

the MLE of the risk ratio (RR), $\text{RR}=R_1/R_0$, is $\text{RR}=\frac{a_1/n_1}{a_0/n_0}$. In the large sample approach, a score statistic (for testing $R_1=R_0$, or equivalently, $\text{RR}=1$) is used, $\chi_S=\frac{a_1-\tilde a_1}{V^{1/2}}$, where the numerator reflects the difference between the oberved and expected counts for exposed cases and $V=(m_1n_1m_0n_0)/(n^2(n-1))$ is the variance of $a_1$. Now, that's all for computing the $p$-value because we know that $\chi_S$ follow a chi-square distribution. In fact, the three $p$-values (mid-$p$, Fisher exact test, and $\chi^2$-test) that are returned by riskratio() are computed in the tab2by2.test() function. For more information on mid-$p$, you can refer to

Berry and Armitage (1995). Mid-P confidence intervals: a brief review. The Statistician, 44(4), 417-423.

Now, for computing the $100(1-\alpha)$ CIs, this asymptotic approach yields an approximate SD estimate for $\ln(\text{RR})$ of $(\frac{1}{a_1}-\frac{1}{n_1}+\frac{1}{a_0}-\frac{1}{n_0})^{1/2}$, and the Wald limits are found to be $\exp(\ln(\text{RR}))\pm Z_c \text{SD}(\ln(\text{RR}))$, where $Z_c$ is the corresponding quantile for the standard normal distribution.

The small sample approach makes use of an adjusted RR estimator: we just replace the denominator $a_0/n_0$ by $(a_0+1)/(n_0+1)$.

As to how to decide whether we should rely on the large or small sample approach, it is mainly by checking expected cell frequencies; for the $\chi_S$ to be valid, $\tilde a_1$, $m_1-\tilde a_1$, $n_1-\tilde a_1$ and $m_0-n_1+\tilde a_1$ should be $> 5$.

Working through the example of Rothman (p. 243),

sel <- matrix(c(2,9,12,7), 2, 2)
riskratio(sel, rev="row")

which yields

$data
          Outcome
Predictor  Disease1 Disease2 Total
  Exposed2        9        7    16
  Exposed1        2       12    14
  Total          11       19    30

$measure
          risk ratio with 95% C.I.
Predictor  estimate    lower    upper
  Exposed2 1.000000       NA       NA
  Exposed1 1.959184 1.080254 3.553240

$p.value
          two-sided
Predictor  midp.exact fisher.exact chi.square
  Exposed2         NA           NA         NA
  Exposed1 0.02332167   0.02588706 0.01733469

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

By hand, we would get $\text{RR} = (12/14)/(7/16)=1.96$, $\tilde a_1 = 19\times 14 / 30= 8.87$, $V = (8.87\times 11\times 16)/ \big(30\times (30-1)\big)= 1.79$, $\chi_S = (12-8.87)/\sqrt{1.79}= 2.34$, $\text{SD}(\ln(\text{RR})) = \left( 1/12-1/14+1/7-1/16 \right)^{1/2}=0.304$, $95\% \text{CIs} = \exp\big(\ln(1.96)\pm 1.645\times0.304\big)=[1.2;3.2]\quad \text{(rounded)}$.

The following papers also addresses the construction of the test statistic for the RR or the OR:

Miettinen and Nurminen (1985). Comparative analysis of two rates. *Statistics in Medicine, 4: 213-226.
Becker (1989). A comparison of maximum likelihood and Jewell's estimators of the odds ratio and relative risk in single 2 × 2 tables. Statistics in Medicine, 8(8): 987-996.
Tian, Tang, Ng, and Chan (2008). Confidence intervals for the risk ratio under inverse sampling. Statistics in Medicine, 27(17), 3301-3324.
Walter and Cook (1991). A comparison of several point estimators of the odds ratio in a single 2 x 2 contingency table. Biometrics, 47(3): 795-811.

Notes

As far as I know, there's no reference to relative risk in Selvin's book (also referenced in the online help).
Alan Agresti has also some code for relative risk.

Relative Risk – How to Calculate Relative Risk Based on Two Independent Confidence Intervals

You can use the Delta method to obtain an approximate distribution of your relative risk, as shown by that link. Then you can define a pivot and use this to obtain a CI.

I understand that there might be some confusion regarding the use of the Delta method, so here are a few simple steps that show how to construct an approximate CI for the relative risk.

Estimate the RR from the data
Find the natural log of RR: $\log(RR)$
The confidence coefficient is from the standard normal distribution: 1.96 for a 95% confidence interval

Now you need the standard error. Using the Delta method for sample sizes $n$ and $m$ with probabilities $p$ and $q$ respectively, this is found to be

$$SE=\sqrt{\frac{1-p}{pn}+\frac{1-q}{qm}}$$

Of course you need to replace the unknown quantities with your estimates, let's denote them by $\widehat{p}$ and $\widehat{q}$. You might notice that this is the second approximation we are using.

Now that you have the formula, compute the standard error: $SE$

Calculate the lower and upper limits on the log scale: $\log(RR) scale: \log(RR) ± 1.96 \times SE \log(RR)$
Exponentiate!

You can find plenty such information throughout the internet and the above steps are taken from here. We all have Fisher to thank for these approximations!

Best Answer

Related Solutions

Relative Risk – How to Calculate Confidence Interval for Relative Risk

Relative Risk – How to Calculate Relative Risk Based on Two Independent Confidence Intervals

Related Question