Solved – Difference between Kaplan Meier Estimator and the Empirical CDF

kaplan-meiersurvival

In survival analysis, you often use the nonparametric maximum likelihood estimator (i.e. Kaplan-Meier estimator) of the survival function $S(t)$. Since $S(t) = 1 – F(t)$, shouldn't we also be able to estimate the empirical cdf and use the aforementioned relationship to find the empirical survival function?

The results are however severely different. Since the academia almost exclusively use the KM estimator (between the two), why is this superior to the other approach (or why is the other approach incorrect).

Note: Considering the ordered outcomes/survival times $\tau_1 \leq \dots \leq \tau_n$, I define the empirical cdf $F_n(t)=\frac{1}{n}\sum_{i=1}^nI\{\tau_i<t\}$

Best Answer

As I understand from a comment, the OP didn't realize that the Kaplan-Meier estimate is nothing but the empirical estimate of the survival function in case when there is no censoring.

Let me tell a word about that. Consider two independent random variables $X$ and $Y$ with continuous distributions, and independent replicated observations $x_i$ and $y_i$, $i=1, \ldots, n$. In the context of the Kaplan-Meier estimate, $Y$ is considered as the censoring variable and one observes the minima $t_i=\min(x_i,y_i)$ together with the indicators $\delta_i={\boldsymbol 1}_{x_i \leq y_i}$, independent replicated observations of $T=\min(X,Y)$ and $\Delta={\boldsymbol 1}_{X \leq Y}$ respectively.

Note that $\Pr(T >t)=\Pr(X>t)\Pr(Y>t)$, that is to say $\boxed{S^T(t)=S^X(t)S^Y(t)}$ by denoting $S^T$, $S^X$ and $S^Y$ the survival functions of $T$, $X$ and $Y$ respectively.

The usual empirical survival function $\hat{S}^T$ of $T$ is available from the data. When seeking estimates $\hat{S}^X$ and $\hat{S}^Y$ of $S^X$ and $S^Y$, it is natural to require the empirical analogous of the previous boxed formula, that is to say $\boxed{\hat{S}^T(t)=\hat{S}^X(t)\hat{S}^Y(t)}$.

Then remember that:

  • The Kaplan-Meier estimates of $S^X$ and $S^Y$ satisfy this relation (at least when there are no ties, I don't know and I have not checked when there are ties). The case when $Y=+\infty$ corresponds to the absence of censoring, in this case $T=X$, $S^Y\equiv 1$, $\hat{S}^Y\equiv 1$ and one gets $\hat{S}^T(t)=\hat{S}^X(t)$: the Kaplan-Meier estimate is nothing but the empirical estimate of the survival function.

  • In fact (at least when there are no ties), the Kaplan-meier estimates can even be derived from the required relation $\boxed{\hat{S}^T(t)=\hat{S}^X(t)\hat{S}^Y(t)}$, after requiring in addition that $\hat{S}^X$ and $\hat{S}^Y$ are step functions jumping at the observations of $x_i$ ($t_i$ when $\delta_i=1$) and $y_i$ ($t_i$ when $\delta_i=0$) respectively.