In R, I am trying to produce a survival analysis using Kaplan Meier. I have previously used SPSS. I am trying to compare my results but R is producing a 95% confidence level of the median, whereas SPSS provides a 95% confidence interval.
How can I produce a 95% of the median provided by R? Here is a copy of my code:
survival.model <- survfit( Surv(Time, Outcome) ~1,
type = "kaplan-meier",
data = my_data,
conf.type = "plain",
conf.int=0.95)
survival.model
I understand that producing a summary of the model gives 95% CI's for each time point but I would like one for the median output above.
Any advice or support would be greatly appreciated. Many thanks.
Best Answer
As Klein and Moeschberger note (page 120, 2nd edition), with $X$ the survival time and $\hat x_p$ the estimate of its $p^{th}$ quantile ($p=1/2$ for median):
According the the "KM Analysis" section of the SPSS Statistical Algorithms document, SPSS uses what Klein and Moeschberger (Practical Note 3, page 122) call "an estimator of the large sample variance" of $\hat x_p$ via "a crude estimate" of the density function for $X$ "based on a uniform kernel density estimate." For the median CI, SPSS uses values based on the estimated $55^{th}$ and $45^{th}$ percentiles. That might lead to questions about how much confidence you should put in the SPSS confidence intervals in the first place.
That's not how estimates are done with the
survival
package. The standard method for median confidence upper and lower limits used withsurvfit()
(via its associatedprint()
orquantile()
functions) is to return the times at which the pointwise confidence limits hit 50%, however those pointwise limits were specified in the originalsurvfit
object.*As Frank Harrell notes in a comment, those confidence estimates based on pointwise survival estimates over time have a potential problem in that they don't represent simultaneous 95% coverage over the entire survival function. Several methods for Kaplan-Meier pointwise and simultaneous confidence intervals are referenced and implemented in the R
km.ci
package. Itskm.ci()
function can return asurvfit
object with adjusted simultaneous CI (thelogep
confidence band is recommended) that you could then interrogate withquantile()
. For estimating CI around the median you might not need to use the full-curve 95% CI but use the function'stl
andtu
time-boundary limits to restrict attention to times around the median, although I haven't thought that through carefully.*Your choice of
conf.type = "plain"
is generally not wise. See Therneau and Grambsch, page 16: "Several authors have examined the choices with somewhat different conclusions. The one certain and common thread in the reports is the marked inferiority of the plain scale: it results in intervals that may extend below 0 or above 1, and has poor coverage properties."