Solved – How to handle irregular interval censoring in a Cox model in R or Stata

cox-modelinterval-censoringrstatatime series

I'm trying to model a weekly process of adoption (adoption events could only occur on Fridays) using the coxph function, but a large quantity of observations are missing for the first 6 years, leaving me with psuedo annual data at irregular intervals.

The problem is, R's method for handling interval censoring appears to assume regular time intervals. As I understand this vignette, interval censored data is represented by three numbers (appropriate medical analogy in parentheses):

  1. First non-adopted observation (or first "well" observation)
  2. Last non-adopted observation (or last "well" observation)
  3. First adopted observation (first infected or death observation)

What I would like to do is elide the last-adopted observation, and instead include a list of missing observation times. In my case, since my basic time unit is a week, specifying that between weeks 5 and 20, 27 and 34 etc. observations were missing would be much more appropriate. Otherwise, it just appears as though massive collections of events happened very irregularly, and the cox model does not take into account the fact that events could have happened during those missing weeks.

Another potential problem is that it is conceivable for an adoption event to occur during the censored time interval and then an "un-adoption" event happens before the next observation. I think the medical analogy normally gets around this problem because events like infection and death are unlikely to have gotten better by the time of the next observation (though it's presumably a problem for them as well in the former case).

My hope is that the trick John Fox uses to handle time dependent covariates will allow me to deal with this problem. Any suggestions welcome (Stata would also be an option) thanks very much.

Best Answer

I don't know if this is going to help much, but economists sometimes think about data like these as discrete time event history data, and fit (strings of) logistic regression models to them. See e.g. http://www.ats.ucla.edu/stat/stata/library/survival2.htm. And apparently you can add random effects to that to control for correlation within the same subject: http://www.jstor.org/stable/3068299.

Related Question