Survival Analysis – How to Handle Delayed Entry in Survival Analysis

survival

I'm analysing a data set of loans in order to estimate the prepayment risk.

One thing which differs from my experiment setup compared to the usual biological/medical experiments is the "time frame".

Medical/biostatistics experiments usually have clear defined start and ending dates of the experiment, for example, the experiment starts in 2000 and ends in 2010 for every individual, whether they survive or not.

In my case I got the same ending point of the experiment, whereas the starting dates of the individuals differ.

For example, individual 1 starts in 2001 and ends in 2010, individual 2 starts in 2007 and ends in 2010, individual 3 starts in 2005 and ends in 2010 and so on, whether the individuals die or not during that frame.

Is survival analysis, specifically Cox-Regression, with delayed entries possible?

Best Answer

Your scenario raises the issue of selection bias.

In order for an individual to be selected for measurement into your study, they must survive until their first period of measurement, and as you point out each individual has a different time of entry. This effectively means that individuals who start later have periods of 'immortality' where they cannot have died (or been excluded for other reasons pertaining to your research), making them more likely to be special in some way as compared with folks who entered the study earlier. For example, individuals entering the study 2002 or later, are individuals who survived past 9/11, whereas those entering the study before 9/11/2001 did not have to survive this.

To help minimize the effects of selection bias:

  1. Be super explicit about defining your target population, and, consequently, your inferences.

  2. Be very careful about comparisons of survival between groups with different distributions of first period of observation dates (ideally the distributions should be identical). Otherwise you run into survivor effects (e.g., healthy worker effect).

  3. If start time distributions vary between comparator groups, include it in your model as an age at entry term, or stratify by it. Including age (either age since birth, or some kind of elapsed time as a control relevant w/r/t your study, a la 'years since surgery' or 'years since graduation') as a covariate or stratifying variable in your model.

NB: If you artificially create a uniform starting time for each person in your data set, then you create an 'immortal person-time' bias (see Rothman and Greenland), which will tend to bias rates toward 0, and as a consequence bias comparisons toward no difference.

Rothman, K. J. and Greenland, S. (1998). Modern Epidemiology, chapter Cohort Studies—Immortal Person-Time. Lippincott-Raven, 2nd edition.

Related Question