Solved – How to calculate the HR and 95%CI using the log-rank test in R

hazardlogrank-testsurvival

The R survival package is very useful to do survival analysis. And I know the survdiff function can be used to compare the difference of survival time in two or more groups. And the p-value number can also be calculated as below. However, how can I calculate the HR and 95% CI using the log-rank test.

And I also know I can use the coxph() function to calculate the HR and 95% CI using the Cox regression. However, as the assumption of both the Cox model and log-rank test are that the hazard ratio stay constant over time, so I think I can also calculate the HR and 95% CI using the log-rank test.

According to the book Survival Analysis: A Practical Approach, I got two formulas on Page 62 and 66 to do this (as shown below). So I wrote the R code as below, is there anybody know whether I'm right?

enter image description here enter image description here

Input codes:

library(survival)
data.survdiff <- survdiff(Surv(time, status) ~ group)
p.val = 1 - pchisq(data.survdiff$chisq, length(data.survdiff$n) - 1)
HR = (data.survdiff$obs[2]/data.survdiff$exp[2])/(data.survdiff$obs[1]/data.survdiff$exp[1])
up95 = exp(log(HR) + qnorm(0.975)*sqrt(1/data.survdiff$exp[2]+1/data.survdiff$exp[1]))
low95 = exp(log(HR) - qnorm(0.975)*sqrt(1/data.survdiff$exp[2]+1/data.survdiff$exp[1]))

Output results:

> data.survdiff
Call:
survdiff(formula = Surv(data[, "os_whw"], data[, "status_whw"] == 
    1) ~ data[, "pcascore"] >= median(data[, "pcascore"]))

                                                       N Observed Expected (O-E)^2/E (O-E)^2/V
data[, "pcascore"] >= median(data[, "pcascore"])=FALSE 4        3     4.33     0.411     0.974
data[, "pcascore"] >= median(data[, "pcascore"])=TRUE  5        5     3.67     0.486     0.974

 Chisq= 1  on 1 degrees of freedom, p= 0.324 
> p.val
[1] 0.3235935
> HR
[1] 1.970484
> up95
[1] 7.917248
> low95
[1] 0.4904239

Best Answer

If your research question is whether groups differ in survival, regardless of other characteristics, you could use the ordinary Kaplan-Meier estimation and obtain p values. That p value is derived from the log rank test. It will merely tell you if groups differ in survival time. That is appropriate if (1) groups are randomized, or (2) you only want to know if groups differ in survivalor (3) you'd like to inspect survival curves.

However, if groups are not randomized and they differ in baseline charactersitics (e.g age, sex etc) then crude survival differences are of little use (e.g differences in survival might be due to age differences); thats when Cox regression comes in. You can use Cox regression to compare survival and adjust for confounders. If your Cox regression does not contain any other predictor than 'group', it will yied same p value as the log rank test from the Kaplan-Meier estimation. But the Cox regession allows you to obtain hazard ratios with confidence intervals for each predictor, along with p values.

Use the coxph function from survival package or the cph function from rms package.

In general there is no reason to tweak this in any other way. I think you'll be fine using the functions from the Rms package, which is accompanied by an excellent book with the same name as the package (Springer; FE Harrell). The package comes with functions that let you visualize and tabulate your results easily.

There might be theoretical reasons for your approach which is beyond me, but in that case one of the experts might guide us. In other scenarios, you'll be fine with this approach.

The package: http://cran.r-project.org/web/packages/rms/rms.pdf

The book: http://www.springer.com/mathematics/probability/book/978-0-387-95232-1

(Btw: you can obtain Cox adjusted survival curves; google it or check out RMS package documentation).

Related Question