Survival – How to Handle Left-Censoring in Survival Analyses

survival

This is clearly not a novel question, however, I can't seem to find a answer in these forums but perhaps I haven't looked in the right places…

I am interested to know what to do practically with data where some subjects have experienced the event of interest before observation time starts. I can only think to either exclude these subjects from the analysis, or alternatively to code them as having the event (or should it be not having the event?) with a time value entered as 0. I am assuming the former option induces some form of bias?

Can someone please advise what the recommended practice is? I use R if package recommendations are helpful.

edit 07/02/2023

Well the question that prompted my post relates to patients who suffer an autoimmune episode that results in an immediate neurological impairment. This affects most but not all patients. The ‘event’ in the survival analysis is the recovery of the neurological deficit (i.e. binarised impairment score returns to ‘normal’). The issue is that while most do, there are a non insubstantial proportion of patients who don’t demonstrate any impairment at admission – in other words they already have the ‘outcome’ at the beginning. Is it ok just to exclude these patients or does this lead to bias? If one should account for this in the analysis, would using interval censoring techniques be the best way to go and are there techniques that allow for both left and right censoring in the one analysis.

Best Answer

I can only think to either exclude these subjects from the analysis, or alternatively to code them as having the event (or should it be not having the event?) with a time value entered as 0.

Klein and Moeschberger is still a very useful text for explaining different types of censoring and truncation in survival analysis.

With the date of the triggering autoimmune episode as the reference time = 0 for each individual and the "event" being a yes/no return to function thereafter, the solution is to treat the cases you describe as having left-censored event times. You know that the event occurred* but only have an upper limit to the time-to-event.

For those cases you do not record the time value as 0. You instead record the value of the elapsed time between the triggering autoimmune episode (the time = 0 for that individual) and the study entry date. You then indicate that as a left-censored value, however your software wants that represented.

This type of study also leads to possible left truncation. Someone who enters the study 2 weeks after the time = 0 of the triggering autoimmune episode without yet experiencing the "event" of return-to-function provides no information about members of the population who might have had return-to-function event times prior to 2 weeks after time = 0. The elapsed time between the autoimmune episode and study entry should then be noted as a left-truncated value. See Section 3.7.3 of Therneau and Grambsch or Section 3.4 of Klein and Moeschberger. That can be done via the counting-process data format.

Depending on how you are evaluating the time of return-to-function, you might need to consider interval censoring. If you only evaluate individuals once a month, an evaluation visit that shows return-to-function just means that the "event" occurred within the previous month. If you want to model actual event times then you should code those as interval censored. Otherwise you are modeling the time to observation of the event. That might be good enough, depending on your understanding of the subject matter.

In principle, all combinations of truncation and censoring can be incorporated into likelihood calculations, as outlined on this page. In practice it's not that easy. For example, the basic survival-modeling functions in R seem to require all Surv() outcome objects to be the same type. So you can't combine interval censoring with left truncation. The usual coxph() function can't be used with interval-censored data. The icenReg package allows for right, left and interval censoring, but I don't think that it handles left truncation. For parametric modeling, the eha package (unlike the standard survreg() function in R) can handle left truncation via counting-process data, but it assumes that the covariate values in place at the study-entry time were in place since time = 0 and I don't think that it handles left-censored observations.

I suspect that some type of Bayesian survival model could handle this situation, but that's beyond my expertise.

Many of the above difficulties might be minimized if you modeled the actual neurological function over time instead of using a binary recovered/not breakdown. Consider that as an alternative. As many posts on this site indicate, it's seldom a good idea to break down a continuous variable into bins.

In response to comment:

Handling someone who enters the study without a known date of the triggering autoimmune episode but with the neurological impairment is tricky. You have a lower-limit (right-censored) estimate of the time to event for that individual, but it's then not clear to me how to deal with the delayed entry bias arising from the (now unknown magnitude of) left-truncation. Getting at least a rough estimate of the date of the triggering episode would be very helpful. For example, the date of the last clinic visit before the triggering episode might be considered.

With your data you need to think carefully all the different transitions among states: triggering episode to neurological impairment (if any), triggering episode without impairment to study entry, triggering episode with impairment to study entry, study entry to recovery from impairment. It might be best to model all of those transitions together. See this page for an outline of such multi-state modeling.

It's possible that some of those transitions can't be evaluated with the data you have. For example, if people have the triggering episode but don't develop the neurological impairment and don't come in to the clinic, data based on your enrolled subjects alone will overestimate the probability of developing the impairment. So you might need an independent estimate of that probability.

You might need to restrict your study consciously to a defined population, for example "those who came to the clinic within N weeks of a triggering episode," and discuss the limitations of that restriction with respect to your findings. Selection bias can be a big problem with a delayed-entry study like yours. In your case, the recovery characteristics of those who show up in your clinic might be quite different from those who just stayed at home.

I would suggest working closely with an experienced local statistician who can engage in a back-and-forth discussion with you and your colleagues about just what you want to model and what you can expect to accomplish with this type of data.

*This approach assumes that all those with the triggering autoimmune episode develop the neurological impairment. If some don't, then left censoring models them as nevertheless having the "event" of return-to-function at some early time. Apply your understanding of the subject matter to decide if this makes sense. Perhaps you should use a joint model of whether the impairment develops at all and, if it does, how long it takes to recover.

Minor update from the discussion

When it comes to time-dependant covariates I have found this to be a little tricky when trying to deploy. I had a CV-question a while ago on this subject that you may want to look into. As I wrote in the comments, in the end the time dependence was a little more than I could conveniently display and explain to my colleagues. Furthermore the model was not strongly affected by this effect so I dropped it and switched to an early and late dataset. I recommend you consider who your audience is and if the time-varying coefficients will add that much to the model.

You have a potentially very serious problem where some patients start their period discharged from the hospital while some are untainted. I think you need to think about possible effect modification between these two groups - do they belong to the same population or not? It is easy to make a case that medication-compliance has a much bigger admission-avoidance impact in the discharged population. I think you at least should have a variable indicating if the patient has started a period straight after hospitalization or not (I've added an example in the code).

I have recently done a medication adherence study, if you haven't read this article I strongly recommend it. In my study I was also able to deduce from the prescription text 94 % of the cases using Python's very powerful regular expressions. I'm planning on doing a post on my blog once the article gets published, the text interpretation is in Swedish but you can very easily use the structure as most prescriptions follow a similar pattern (let me know if this would be useful and I can write up the post a little earlier). The advantage is that you want to identify exactly when a patient is expected to be without medication because you will probably have a very close relationship between that and readmission.

Cox Regression – Handling a Subject’s Time Before Treatment in Cox Regression Data Setup

In a survival analysis, the time at-risk is determined by the hypothesis under investigation. You can ask yourself “when does my experiment start”? For a trial, time 0 is the start of the trial; all prior survival is ignored. For an observational study, time 0 may be clearly defined (ie, after a surgery) or can be less clear (ie, appearance in a clinic). Simply having lived prior to the study is not a sufficient reason to include that time as part of the study.

Best Answer

Related Solutions

Solved – perform Cox regression on left truncated records

Minor update from the discussion

Cox Regression – Handling a Subject’s Time Before Treatment in Cox Regression Data Setup

Related Question