First a disclaimer: I've never had to use the time start/end variable in this way and although I'm familiar with mixed effects models I have never really had to use them IRL. Feel free to correct me if I've made a mistake
The problem consists out of two things as I see it:
- One person can occur multiple times. This puts the observations independence into question.
- A person may enter and exit the cohort at risk at different times throughout the study, i.e. this is an open (dynamic) cohort.
For the first point I think using a mixed effects model is a must. I use R and there is a coxme
package recently developed by prof. Therneau. The vignette documentation is excellent and it seems easy to deploy.
For the second point you just need to add the start and end point to the survival object. This is fairly easy in R although I have never had to use it myself. Below is an example that should work:
# Set the event yes/no (1/0)
df$event <- !is.na(df$IpAdmit)
# Those that lack a date should have one
df$discharged <- !is.na(df$start)
df$start[is.na(df$start)] <- as.Date("2010-01-01")
df$end[is.na(df$end)] <- as.Date("2011-12-31")
# Can be merged into one step without the sv variable
sv <- Surv(time=df$start, time2=df$end, event=df$event)
# A model where the medication possession ratio (compliance) interacts with
# the fact that a patient has been discharged
coxme(sv ~ discharged*MPR + age + sex + (1|MemberID), data=df)
You might want to consider what you want to achieve with the cox regression model in this case. I am not sure that hazards make sense in this setting, although this is very difficult to know without going through the full study protocol. Make sure that others have used cox regressions in similar settings prior to this analysis. It seems to me that a good alternative would be a mixed effect logistic regression where you simply use odds for admission and add the number of days at risk as a predictor, preferably as a natural spline or something that allows a non-linear relationship.
Minor update from the discussion
When it comes to time-dependant covariates I have found this to be a little tricky when trying to deploy. I had a CV-question a while ago on this subject that you may want to look into. As I wrote in the comments, in the end the time dependence was a little more than I could conveniently display and explain to my colleagues. Furthermore the model was not strongly affected by this effect so I dropped it and switched to an early and late dataset. I recommend you consider who your audience is and if the time-varying coefficients will add that much to the model.
You have a potentially very serious problem where some patients start their period discharged from the hospital while some are untainted. I think you need to think about possible effect modification between these two groups - do they belong to the same population or not? It is easy to make a case that medication-compliance has a much bigger admission-avoidance impact in the discharged population. I think you at least should have a variable indicating if the patient has started a period straight after hospitalization or not (I've added an example in the code).
I have recently done a medication adherence study, if you haven't read this article I strongly recommend it. In my study I was also able to deduce from the prescription text 94 % of the cases using Python's very powerful regular expressions. I'm planning on doing a post on my blog once the article gets published, the text interpretation is in Swedish but you can very easily use the structure as most prescriptions follow a similar pattern (let me know if this would be useful and I can write up the post a little earlier). The advantage is that you want to identify exactly when a patient is expected to be without medication because you will probably have a very close relationship between that and readmission.
In a survival analysis, the time at-risk is determined by the hypothesis under investigation. You can ask yourself “when does my experiment start”? For a trial, time 0 is the start of the trial; all prior survival is ignored. For an observational study, time 0 may be clearly defined (ie, after a surgery) or can be less clear (ie, appearance in a clinic). Simply having lived prior to the study is not a sufficient reason to include that time as part of the study.
Best Answer
Klein and Moeschberger is still a very useful text for explaining different types of censoring and truncation in survival analysis.
With the date of the triggering autoimmune episode as the reference
time = 0
for each individual and the "event" being a yes/no return to function thereafter, the solution is to treat the cases you describe as having left-censored event times. You know that the event occurred* but only have an upper limit to the time-to-event.For those cases you do not record the time value as 0. You instead record the value of the elapsed time between the triggering autoimmune episode (the
time = 0
for that individual) and the study entry date. You then indicate that as a left-censored value, however your software wants that represented.This type of study also leads to possible left truncation. Someone who enters the study 2 weeks after the
time = 0
of the triggering autoimmune episode without yet experiencing the "event" of return-to-function provides no information about members of the population who might have had return-to-function event times prior to 2 weeks aftertime = 0
. The elapsed time between the autoimmune episode and study entry should then be noted as a left-truncated value. See Section 3.7.3 of Therneau and Grambsch or Section 3.4 of Klein and Moeschberger. That can be done via the counting-process data format.Depending on how you are evaluating the time of return-to-function, you might need to consider interval censoring. If you only evaluate individuals once a month, an evaluation visit that shows return-to-function just means that the "event" occurred within the previous month. If you want to model actual event times then you should code those as interval censored. Otherwise you are modeling the time to observation of the event. That might be good enough, depending on your understanding of the subject matter.
In principle, all combinations of truncation and censoring can be incorporated into likelihood calculations, as outlined on this page. In practice it's not that easy. For example, the basic survival-modeling functions in R seem to require all
Surv()
outcome objects to be the same type. So you can't combine interval censoring with left truncation. The usualcoxph()
function can't be used with interval-censored data. TheicenReg
package allows for right, left and interval censoring, but I don't think that it handles left truncation. For parametric modeling, theeha
package (unlike the standardsurvreg()
function in R) can handle left truncation via counting-process data, but it assumes that the covariate values in place at the study-entry time were in place sincetime = 0
and I don't think that it handles left-censored observations.I suspect that some type of Bayesian survival model could handle this situation, but that's beyond my expertise.
Many of the above difficulties might be minimized if you modeled the actual neurological function over time instead of using a binary recovered/not breakdown. Consider that as an alternative. As many posts on this site indicate, it's seldom a good idea to break down a continuous variable into bins.
In response to comment:
Handling someone who enters the study without a known date of the triggering autoimmune episode but with the neurological impairment is tricky. You have a lower-limit (right-censored) estimate of the time to event for that individual, but it's then not clear to me how to deal with the delayed entry bias arising from the (now unknown magnitude of) left-truncation. Getting at least a rough estimate of the date of the triggering episode would be very helpful. For example, the date of the last clinic visit before the triggering episode might be considered.
With your data you need to think carefully all the different transitions among states: triggering episode to neurological impairment (if any), triggering episode without impairment to study entry, triggering episode with impairment to study entry, study entry to recovery from impairment. It might be best to model all of those transitions together. See this page for an outline of such multi-state modeling.
It's possible that some of those transitions can't be evaluated with the data you have. For example, if people have the triggering episode but don't develop the neurological impairment and don't come in to the clinic, data based on your enrolled subjects alone will overestimate the probability of developing the impairment. So you might need an independent estimate of that probability.
You might need to restrict your study consciously to a defined population, for example "those who came to the clinic within N weeks of a triggering episode," and discuss the limitations of that restriction with respect to your findings. Selection bias can be a big problem with a delayed-entry study like yours. In your case, the recovery characteristics of those who show up in your clinic might be quite different from those who just stayed at home.
I would suggest working closely with an experienced local statistician who can engage in a back-and-forth discussion with you and your colleagues about just what you want to model and what you can expect to accomplish with this type of data.
*This approach assumes that all those with the triggering autoimmune episode develop the neurological impairment. If some don't, then left censoring models them as nevertheless having the "event" of return-to-function at some early time. Apply your understanding of the subject matter to decide if this makes sense. Perhaps you should use a joint model of whether the impairment develops at all and, if it does, how long it takes to recover.