Calculate the standard deviation from a hazard ratio’s confidence interval

confidence intervalhazardmeta-analysisrstandard deviation

I often come across hazard ratios and their confidence intervals in the published literature on clinical trials. I would like to calculate the standard deviation from these confidence intervals for some analysis I'll be doing (generating random draws for this hazard ratio from a log-normal distribution).

Having read on this over the past few days, my thought process is that to convert the confidence intervals of a hazard ratio to the standard deviation of that hazard ratio, I would do the following:

  1. Take the natural log of the upper limit minus the natural log of the lower limit.
  2. Divide by 2 times the standard error.
  3. For the 95% confidence interval this would be 2 x 1.96 = 3.92, for the 90% confidence interval this would thus be 2 x 1.645 = 3.29, and for 99% confidence intervals this would be 2 x 2.575 = 5.15.
  4. If the sample size in either group studied, say a treated group and a control group, is below 100, then I should assume that the authors reporting this hazard ratio calculated this confidence interval using a t distribution, and thus I should replace the numbers 3.92, 3.29 and 5.15 above with numbers specific to the t distribution and the sample size. I do this by going to t distribution tables with degrees of freedom equal to the sample size of both groups summed, minus 2.

This is how I would calculate a standard deviation in the R programming language for an example study reporting HR, 0.69; 95% CI, 0.54 to 0.89 in mCRC for cetuximab plus FOLFOX-4 vs FOLFOX-4 alone found here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7044820/pdf/bmjopen-2019-030738.pdf:

(log(0.89) - log(0.54)) / 3.92 = 0.1274623

Is this the right way to calculate the standard deviation from the confidence intervals of a hazard ratio?

EDIT

To more clearly motivate this question, I am a health economist estimating transitions between health states. In my analysis, there is an initial, and well established, transition probability from the stable disease state to the progressive disease state under standard of care treatment.

The literature indicates that this transition probability is decreased by a new medical intervention. The literature describes the hazard ratio for progression with this new intervention vs standard of care based on a clinical trial of cancer patients. Thus, I would like to update the transition probabilities for transitioning from stable disease to progressive disease under standard of care using this hazard ratio, to create transition probabilities for this new intervention as part of a cost-effectiveness analysis of this new medical intervention.

Initially, this will be done just with the hazard ratio reported in the clinical trial. Following this, I would like to conduct a probabilistic sensitivity analysis which reflects the uncertainty in this hazard ratio when creating transition probabilities. To do this, I need to take random draws from the log-normal distribution for the hazard ratio, as hazard ratios are typically skewed unless put on the log scale to normalise.

The following code is used in the R programming language to make these draws:

hr_draws <- rlnorm(nsims, meanlog = log(mean), sdlog = SD). 

This is why I am trying to determine how to create the standard deviation for my hazard ratio as above, in order to create a probabilistic hazard ratio.

My sources are here:

https://handbook-5-1.cochrane.org/chapter_7/7_7_3_2_obtaining_standard_deviations_from_standard_errors_and.htm

https://handbook-5-1.cochrane.org/chapter_7/7_7_3_3_obtaining_standard_deviations_from_standard_errors.htm

https://cran.rstudio.com/web/packages/episensr/vignettes/b_probabilistic.html

Best Answer

First, it's probably best to refrain from using the terminology "standard deviation" in the context of regression coefficients, as there's potential confusion in whether you mean the standard deviation of the sampling distribution of a statistic or the standard deviation of some value among members of the underlying population. The former depends on sample size, the second doesn't (although estimates of it do).

The term "standard error" is better here: it specifically has the former meaning, as both @mdewey and Wikipedia note. At least in R, the terminology "standard error" is used for reports of error estimates in regression coefficients.

Second, if you are evaluating hazard ratios from survival models, those are the exponentiations of coefficients determined by maximum (partial) likelihood methods with asymptotic normality assumed for the coefficient estimates in the original scale. t-statistics aren't involved in setting their confidence intervals (CI). That's also true for most risk ratios, rate ratios, and response ratios that you would see reported from logistic or Poisson regression models. It's wise to check the reported methods for statistical details; if the "significance" is based on a z-test or a Wald test then normality was assumed.

Third, in terms of your sensitivity analysis, it probably will be easiest and safest to sample from the assumed normal distributions of the regression coefficients and only move to the hazard-ratio scale at the last stage. If you are going to be doing simulations as part of your sensitivity analysis, the software will probably assume you are providing the regression coefficients that go into the linear-predictor values, not the hazard ratios.

Fourth, your desired formula for estimating the standard error of the coefficient estimate (and thus the standard deviation of the corresponding normal distribution you might use for sensitivity analysis) is essentially what you've written, except that your text in Step 2 doesn't agree with what you then do. For a normally distributed statistic, symmetric 95% CI (as usually assumed in the regression-coefficient scale) are at the 2.5th and 97.5th percentiles of the estimated distribution. Back calculating from the 95% upper and lower CI (UCI, LCI) of a hazard ratio thus provides a standard error estimate in the regression-coefficient scale: $$\text{SE}=\frac{\ln \text{UCI} - \ln \text{LCI}}{2 * 1.96}$$

with the $1.96$ value in the denominator changed as you note if the original CI were instead 90% CI or 99% CI.

Related Question