Survival – How to Handle Left-Censoring in Survival Analyses

survival

This is clearly not a novel question, however, I can't seem to find a answer in these forums but perhaps I haven't looked in the right places…

I am interested to know what to do practically with data where some subjects have experienced the event of interest before observation time starts. I can only think to either exclude these subjects from the analysis, or alternatively to code them as having the event (or should it be not having the event?) with a time value entered as 0. I am assuming the former option induces some form of bias?

Can someone please advise what the recommended practice is? I use R if package recommendations are helpful.

edit 07/02/2023

Well the question that prompted my post relates to patients who suffer an autoimmune episode that results in an immediate neurological impairment. This affects most but not all patients. The ‘event’ in the survival analysis is the recovery of the neurological deficit (i.e. binarised impairment score returns to ‘normal’). The issue is that while most do, there are a non insubstantial proportion of patients who don’t demonstrate any impairment at admission – in other words they already have the ‘outcome’ at the beginning. Is it ok just to exclude these patients or does this lead to bias? If one should account for this in the analysis, would using interval censoring techniques be the best way to go and are there techniques that allow for both left and right censoring in the one analysis.

Best Answer

I can only think to either exclude these subjects from the analysis, or alternatively to code them as having the event (or should it be not having the event?) with a time value entered as 0.

Klein and Moeschberger is still a very useful text for explaining different types of censoring and truncation in survival analysis.

With the date of the triggering autoimmune episode as the reference time = 0 for each individual and the "event" being a yes/no return to function thereafter, the solution is to treat the cases you describe as having left-censored event times. You know that the event occurred* but only have an upper limit to the time-to-event.

For those cases you do not record the time value as 0. You instead record the value of the elapsed time between the triggering autoimmune episode (the time = 0 for that individual) and the study entry date. You then indicate that as a left-censored value, however your software wants that represented.

This type of study also leads to possible left truncation. Someone who enters the study 2 weeks after the time = 0 of the triggering autoimmune episode without yet experiencing the "event" of return-to-function provides no information about members of the population who might have had return-to-function event times prior to 2 weeks after time = 0. The elapsed time between the autoimmune episode and study entry should then be noted as a left-truncated value. See Section 3.7.3 of Therneau and Grambsch or Section 3.4 of Klein and Moeschberger. That can be done via the counting-process data format.

Depending on how you are evaluating the time of return-to-function, you might need to consider interval censoring. If you only evaluate individuals once a month, an evaluation visit that shows return-to-function just means that the "event" occurred within the previous month. If you want to model actual event times then you should code those as interval censored. Otherwise you are modeling the time to observation of the event. That might be good enough, depending on your understanding of the subject matter.

In principle, all combinations of truncation and censoring can be incorporated into likelihood calculations, as outlined on this page. In practice it's not that easy. For example, the basic survival-modeling functions in R seem to require all Surv() outcome objects to be the same type. So you can't combine interval censoring with left truncation. The usual coxph() function can't be used with interval-censored data. The icenReg package allows for right, left and interval censoring, but I don't think that it handles left truncation. For parametric modeling, the eha package (unlike the standard survreg() function in R) can handle left truncation via counting-process data, but it assumes that the covariate values in place at the study-entry time were in place since time = 0 and I don't think that it handles left-censored observations.

I suspect that some type of Bayesian survival model could handle this situation, but that's beyond my expertise.

Many of the above difficulties might be minimized if you modeled the actual neurological function over time instead of using a binary recovered/not breakdown. Consider that as an alternative. As many posts on this site indicate, it's seldom a good idea to break down a continuous variable into bins.

In response to comment:

Handling someone who enters the study without a known date of the triggering autoimmune episode but with the neurological impairment is tricky. You have a lower-limit (right-censored) estimate of the time to event for that individual, but it's then not clear to me how to deal with the delayed entry bias arising from the (now unknown magnitude of) left-truncation. Getting at least a rough estimate of the date of the triggering episode would be very helpful. For example, the date of the last clinic visit before the triggering episode might be considered.

With your data you need to think carefully all the different transitions among states: triggering episode to neurological impairment (if any), triggering episode without impairment to study entry, triggering episode with impairment to study entry, study entry to recovery from impairment. It might be best to model all of those transitions together. See this page for an outline of such multi-state modeling.

It's possible that some of those transitions can't be evaluated with the data you have. For example, if people have the triggering episode but don't develop the neurological impairment and don't come in to the clinic, data based on your enrolled subjects alone will overestimate the probability of developing the impairment. So you might need an independent estimate of that probability.

You might need to restrict your study consciously to a defined population, for example "those who came to the clinic within N weeks of a triggering episode," and discuss the limitations of that restriction with respect to your findings. Selection bias can be a big problem with a delayed-entry study like yours. In your case, the recovery characteristics of those who show up in your clinic might be quite different from those who just stayed at home.

I would suggest working closely with an experienced local statistician who can engage in a back-and-forth discussion with you and your colleagues about just what you want to model and what you can expect to accomplish with this type of data.


*This approach assumes that all those with the triggering autoimmune episode develop the neurological impairment. If some don't, then left censoring models them as nevertheless having the "event" of return-to-function at some early time. Apply your understanding of the subject matter to decide if this makes sense. Perhaps you should use a joint model of whether the impairment develops at all and, if it does, how long it takes to recover.