Kaplan Meier Interpretation – How to Interpret Kaplan Meier with Truncated and Right Censored Data

kaplan-meiersurvivaltruncation

I cannot seem to understand the interpretation of the Kaplan-Meier with truncated data.

Here, we have associated, with the j:th individual, a random age $L_j$
at which he/she enters the study and a time $T_j$ at which he/she
either dies or is censored. As in the case of right-censored data,
define $t_1 \leq t_2 \dots \leq t_D$ as the distinct death times and
let $d_i$ be the number of individuals who experience the event of
interest at time $t_i$. The remaining quantity needed to compute the
statistics in the previous sections is the number of individuals who
are at risk of experiencing the event of interest at time $t_i$,
namely $Y_i$. For right-censored data, this quantity was the number of
individuals on study at time 0 with a study time of at least $t_i$.
For left-truncated data, we redefine $Y_i$ as the number of
individuals who entered the study prior to time $t_i$ and who have a
study time of at least $t_i$, that is, $Y_i$ is the number of
individuals with $L_j < t_i \leq Tj$. Using $Y_i$ as redefined for
left-truncated data, all of the estimation procedures defined in
sections 4.2–4.4 are now applicable. However, one must take care in
interpreting these statistics. For example, the Product-Limit
estimator of the survival function at a time $t$ is now an estimator of
the probability of survival beyond $t$, conditional on survival to the
smallest of the entry times $L$, $Pr[X>t|X\ge L]=S(t)/S(L)$.

(From Survival Analysis: Techniques for Censored and Truncated Data, p.123 by Klein and Moeschberger)

Assuming my sampling period over which my subjects are sampled. It begins at $t_0$ and ends at $T$. The truncated data will consist of subjects who where alive at $t_0=0$. Naturally, these will have all ranges of "birth dates" $[-T_{first},t_0)$ with increasing frequency towards $t_0$. Is my smallest of entry times here first observation smaller than $t_0$ (which is basically $t_0$)? In that case the interpretation of the estimator is basically the same as for without the truncation, since $L$ can practically be regarded as 0.

Edit: In accordance with the quote, I have, for the truncated data, truncation times $t_0 \approx L_1 \leq \dots \leq L_j = T_{first}$.

So the time line is as follows:
$T_{first}, \dots, L_1, t_0, \dots T_n$.

Because of the extended data set, I have subjects born immediately prior to $t_0$ (as far as the discretization of time allows). So my first question is, in terms of the settings described by Klein and Moeschberger (quote), is my smallest entry time $T_{first}$, which is the smallest (first) entry time of all subjects (at $t_0$ the oldest subject) or is it $L_1$ because its smallest in terms of being closest to 0.

As I have understood it, its the latter. Since, their respective conditional prob. would be
$P(X>t|X>T_{first})$ and $P(X>t|X>L_1)$ where in this sense, $L_1$ is smaller.

Also, for all non-truncated subjects, why can't I assume an "artificial" truncation time $L_j = 0$.

Finally, if there is any logical/mathematical inconsistency in my reasoning, could you please explain what and why?

Best Answer

I'll give an explanation that is very close to that of Maarten Buis but just a little more elaborate. As always in survival analysis, different time scales can be applied. I think that age is maybe the more intuitive time scale in your setting, so that's where I'll start my answer. Afterwards, I'll try to use that intuition to answer the question.

Let $C_i$ be time of birth. From your data we can easily calculate ages of entering the study,

$$ A_i = t_0 - C_i $$

and age of exiting the study,

$$ B_i = \min\{T - C_i, D_i\}, $$

where $D_i$ is age at death. Now note, that we have some age interval, $(A_i, B_i]$ where the $i$'th subject is under observation. On this time scale, the study subjects do not enter the study at the same time. Let's denote the minimum of the age at entering the study,

$$ \alpha = \min_i A_i. $$

What survival information do we have before time $\alpha$? None. This is why we can't say anything about the probability of surviving the age interval $(0, \alpha]$. Necessarily, our Kaplan-Meier estimate must be conditional on survival until age $\alpha$. To give an example: Let's say that $\alpha$ is $1$ year. Would we be able to calculate the survivor function at time $5$ years, $S(5) = P(D > 5)$. Could we calculate how many children would live to see their fifth birthday? No, because we simply don't know how dangerous the first year is. We can calculate only the conditional survivor function $P(D > 5|D > 1)$. Actually, this can again be explained by a change in time scale: there is nothing special about 0, your Kaplan-Meier estimate doesn't have to start at time zero, it can start at some other time, which corresponds to e.g. the time scale defined by age minus $\alpha$. In your data, you write that $\alpha$ is very small as some children are included very young, thus, for $s > \alpha$

$$ P(D > s | D > \alpha) = S(s)/S(\alpha) \simeq S(s) $$

and actually there is equality in the limit $\alpha \rightarrow 0$ if we assume $S$ to be continuous.

Let's change back to your original time scale, plain calender time. You have no idea how dangerous the time before $t_0$ is, therefore your estimate must be conditional on surviving until $t_0$. This stems from the fact that no children are observed before time $t_0$. On this time scale, it doesn't make much of a difference how close the times of birth are to $t_0$ as we have assumed the same hazard for all ages (instead of an age-specific hazard as above). To sum up, on this time scale (using calender time), the interpretation would of the Kaplan-Meier estimate would be that of (for $t \in (t_0, T]$),

$$ P(X > t | X > t_0). $$

This is not as intuitive as on the age time scale, however, it just means that when doing a study in calender time, we condition on the subjects having survived the time from birth until the start of our study.

To answer the last part of the question, you do not condition on $T_{first}$ nor on $L_1$, you condition on survival until $t_0$ as this is the minimum of entering times. I think part of the confusion is due to the fact that all the children enter your study at the same time, which is not necessarily the case in all applications, as is evident from using age as the time scale above.

Finally, you could easily say that non-truncation corresponds to the truncation time being smaller than or equal to 0 (or some other natural starting point on a time scale).