Solved – Using survival analysis with multiple events

pythonsurvival

Assuming that I have a data set of people with the following features:

  • age (numeric)
  • education (factor)
  • time an 'event' happened to them(could happen multiple times in the recorded time of the person). The time is measured in months.

and I want to predict when this event will luckily happen again in a time domain(ex. the next 5 months).

For example:
There is a 32 years old person with Masters degree and had this 'event' at the 24th month and at the 33th month. We are now at the 40nth month and I want to estimate the chances of this person has that 'event' the next 3 months(month 40 to 43).

What I was thinking is to use survival regression(I am using python, and I found this library for SA: lifeline) with the following features:

  • age
  • education
  • mean of time interval between two 'events' happened. For the example would be: ((24-0)+(33-24))/2 = 16.5.

But, I was wondering if there is a more 'dynamic' way of implementing it. Can survival analysis take into account multiple events? Or is there another way to impliment the whole model? Any idea or feedback is accepted.

Best Answer

Yes, there are more things that you can do, but you need to make some decisions about time scales. Your approach pertains more about the gap time scale (time since last event). I would emphasize that survival analysis may help you if you have right-censoring, otherwise you don't really need it.

Let's assume that your individual has age $x_i = 32$ and events at $t_{i1} = 24$ and $t_{i2} = 33$. Also I'll assume that the same individual is followed up until $\tau_i = 40$. Then this means that you have 3 gap times: $w_{i1} = 24$, $w_{i2} = 9$ and $w_{i3} = 7$ with the last one right censored. Then you can assume that the gap times are independent and you have the problem of estimating their distribution. This is the same as having clustered survival data. This can depend on a covariate such as "age at start", which would be $x_i$. Then the basic idea is to extrapolate from the estimated distribution $W | x_i$ the probability that an event will happen at a certain time point after the last one.

You might want to think that a certain individual tends to have shorter gap times or longer gap times. Then you can use a random effect (called frailty in survival analysis), where you basically assume that the gap times are independent within an individual, but not between individuals. Most software (such as the survival package in R) will give you an estimate of that random effect. Then the distribution of the gap times of individual $i$ would be $W |x_i, z_i$ .

If the gap times can not be reasonably assumed to be independent even within an individual, then things can get more complicated.

A second idea is to model the process in terms of times since beginning of study. In that case your observations must be put in the Andersen-Gill format, i.e. a (tstart, tstop, status) format. Your data would be $(0, 24, 1; x_i), (24, 33, 1; x_i), (33, 40, 0; x_i)$. This will give you an estimate of the intensity of this process (analogue to the hazard). Again you can use a random effect here as well. However it is difficult to predict the next event like this; the intensity just gives you the probability of an event happening within a time interval.

My conclusion is that yes, there are "cool" things that you can think of, but it ultimately depends on what assumptions fit your data and on the exact questions that you are interested in. I always recommend the book of Cook & Lawless as a very clear introduction to modeling recurrent events data.

Related Question