Kaplan Meier – How to Use Survfit Confidence Interval of the Median in Kaplan Meier Analysis

confidence intervalkaplan-meierrsurvival

In R, I am trying to produce a survival analysis using Kaplan Meier. I have previously used SPSS. I am trying to compare my results but R is producing a 95% confidence level of the median, whereas SPSS provides a 95% confidence interval.

How can I produce a 95% of the median provided by R? Here is a copy of my code:

survival.model <- survfit( Surv(Time, Outcome) ~1,
                           type = "kaplan-meier",
                           data = my_data,
                           conf.type = "plain",
                           conf.int=0.95)
survival.model

I understand that producing a summary of the model gives 95% CI's for each time point but I would like one for the median output above.

Any advice or support would be greatly appreciated. Many thanks.

Best Answer

As Klein and Moeschberger note (page 120, 2nd edition), with $X$ the survival time and $\hat x_p$ the estimate of its $p^{th}$ quantile ($p=1/2$ for median):

In practice, the standard error of $\hat x_p$ is difficult to compute because it requires an estimate of the density function of $X$ at $\hat x_p$.

According the the "KM Analysis" section of the SPSS Statistical Algorithms document, SPSS uses what Klein and Moeschberger (Practical Note 3, page 122) call "an estimator of the large sample variance" of $\hat x_p$ via "a crude estimate" of the density function for $X$ "based on a uniform kernel density estimate." For the median CI, SPSS uses values based on the estimated $55^{th}$ and $45^{th}$ percentiles. That might lead to questions about how much confidence you should put in the SPSS confidence intervals in the first place.

That's not how estimates are done with the survival package. The standard method for median confidence upper and lower limits used with survfit() (via its associated print() or quantile() functions) is to return the times at which the pointwise confidence limits hit 50%, however those pointwise limits were specified in the original survfit object.*

As Frank Harrell notes in a comment, those confidence estimates based on pointwise survival estimates over time have a potential problem in that they don't represent simultaneous 95% coverage over the entire survival function. Several methods for Kaplan-Meier pointwise and simultaneous confidence intervals are referenced and implemented in the R km.ci package. Its km.ci() function can return a survfit object with adjusted simultaneous CI (the logep confidence band is recommended) that you could then interrogate with quantile(). For estimating CI around the median you might not need to use the full-curve 95% CI but use the function's tl and tu time-boundary limits to restrict attention to times around the median, although I haven't thought that through carefully.


*Your choice of conf.type = "plain" is generally not wise. See Therneau and Grambsch, page 16: "Several authors have examined the choices with somewhat different conclusions. The one certain and common thread in the reports is the marked inferiority of the plain scale: it results in intervals that may extend below 0 or above 1, and has poor coverage properties."

Related Question