Solved – Interpreting Median Survival vs Median in survival data

kaplan-meiermedianrsurvival

I'm doing some survival analysis in R using the survival package for a patient cohort. When making a Kaplan-Meier curve using survfit, it reports a median survival time. However this value is very different than simply an arithmetic median using median.

e.g.

survfit(Surv(OS, vital.status) ~ 1, data = df)

reports a median ~440.

Whereas if I run

median(df$OS)

I get ~180

Even with a subset of the dataframe

sub <- subset(df, df$vital.status == 1)
survfit(Surv(OS, vital.status) ~ 1, data = sub)

I still don't get near the 440 median from the kaplan meier (I got ~ 180 again).

Why is it that this 50% median survival Kaplan meier different from the median from just a basic median. I understand that with the full dataset, the censored data is not included in the median survival. But when I subset for only events, shouldn't the arithmetic median also correspond to the half-way point? What is difference in the interpretation of both of these?

Thanks!

Best Answer

So I think I figured it out.

In the KM, censored individuals are assumed to be still alive despite their OS time. Thus the 50% assumes that those on the left are on the right side of the distribution.

When running a KM with individuals that all had an event, the median survival time is the same as the standard median.