Solved – perform Cox regression on left truncated records

hazardmultiple regressionsurvival

Is a Cox proportional hazards regression model appropriate for records which are left truncated?

I'm developing a model that predicts risk of hospitalization during a two year time period. Some members of my study population have been hospitalized prior to the study period (left censoring) based on records prior to the study period. No problem, there are exisitng techniques to adjust for left censoring.

However, there are also left truncated members for whom we have no information whether they have been hospitalized or not prior to the study. Some of these subjects float through the 2 year period without being hospitalized.

Is there any way to retain the left truncated subjects in the Cox regression model? Should I consider another methodology, such as logistic regression?

Edit:

To clarify, here are some record examples for clarification. I apologize for the formatting – I don't know how to create tables in the editor. As per stnd methodology for a recurring event Cox regression, there are multiple records for member 1 (CT0001) associated with each interval ending in an IP admission or the end of the time period. The Start and End columns display the beginning and end dates of each interval, or . if missing. Member 2 (CT0002) didn't go to the hospital during the study period. But we still want member 2 in the dataset, since we're predicting risk of hospitalization for the entire population based on demographic variables even if there was no IP visit.

**MemberID    IP_admit      IP_disch    Start       End         Censoring/Truncation?**
----------
CT0001        10/1/2010     10/20/2010  .           .           .
CT0001        3/1/2011      3/15/2011   .           3/1/2011    Left censored
CT0001        7/1/2011      7/5/2011    3/16/2011   7/1/2011    .
CT0001        11/1/2011     11/15/2011  7/6/2011    11/1/2011   .
CT0001        .             .           11/16/2011  .           Right censored
CT0002        .             .           .           .           Left truncation

Best Answer

First a disclaimer: I've never had to use the time start/end variable in this way and although I'm familiar with mixed effects models I have never really had to use them IRL. Feel free to correct me if I've made a mistake

The problem consists out of two things as I see it:

  1. One person can occur multiple times. This puts the observations independence into question.
  2. A person may enter and exit the cohort at risk at different times throughout the study, i.e. this is an open (dynamic) cohort.

For the first point I think using a mixed effects model is a must. I use R and there is a coxme package recently developed by prof. Therneau. The vignette documentation is excellent and it seems easy to deploy.

For the second point you just need to add the start and end point to the survival object. This is fairly easy in R although I have never had to use it myself. Below is an example that should work:

# Set the event yes/no (1/0)
df$event <- !is.na(df$IpAdmit)
# Those that lack a date should have one
df$discharged <- !is.na(df$start)
df$start[is.na(df$start)] <- as.Date("2010-01-01")
df$end[is.na(df$end)] <- as.Date("2011-12-31")

# Can be merged into one step without the sv variable
sv <- Surv(time=df$start, time2=df$end, event=df$event)
# A model where the medication possession ratio (compliance) interacts with
# the fact that a patient has been discharged
coxme(sv ~ discharged*MPR + age + sex + (1|MemberID), data=df)

You might want to consider what you want to achieve with the cox regression model in this case. I am not sure that hazards make sense in this setting, although this is very difficult to know without going through the full study protocol. Make sure that others have used cox regressions in similar settings prior to this analysis. It seems to me that a good alternative would be a mixed effect logistic regression where you simply use odds for admission and add the number of days at risk as a predictor, preferably as a natural spline or something that allows a non-linear relationship.

Minor update from the discussion

When it comes to time-dependant covariates I have found this to be a little tricky when trying to deploy. I had a CV-question a while ago on this subject that you may want to look into. As I wrote in the comments, in the end the time dependence was a little more than I could conveniently display and explain to my colleagues. Furthermore the model was not strongly affected by this effect so I dropped it and switched to an early and late dataset. I recommend you consider who your audience is and if the time-varying coefficients will add that much to the model.

You have a potentially very serious problem where some patients start their period discharged from the hospital while some are untainted. I think you need to think about possible effect modification between these two groups - do they belong to the same population or not? It is easy to make a case that medication-compliance has a much bigger admission-avoidance impact in the discharged population. I think you at least should have a variable indicating if the patient has started a period straight after hospitalization or not (I've added an example in the code).

I have recently done a medication adherence study, if you haven't read this article I strongly recommend it. In my study I was also able to deduce from the prescription text 94 % of the cases using Python's very powerful regular expressions. I'm planning on doing a post on my blog once the article gets published, the text interpretation is in Swedish but you can very easily use the structure as most prescriptions follow a similar pattern (let me know if this would be useful and I can write up the post a little earlier). The advantage is that you want to identify exactly when a patient is expected to be without medication because you will probably have a very close relationship between that and readmission.

Related Question