Discrete Time Survival Models – Why No Random Effect Required

binary datacorrelationmixed modelsurvival

I learned that you can use random effects as a way to induce correlation in regression models. Say, we measure cholesterol over a 12 week period. The random effect "tells" the regression model that the data isn't independent: person's response in week 1 should be correlated to their response in week 2 and so on. Of course, you can specify more complicated types of correlation (I believe a simple random effect induces a compound symmetry correlation structure).

When you fit a discrete time survival model, you first transform the time-to-event data into counting process format (sometimes called person-period data) where rows are pseudo-observations for a person in each time period. Then you fit a model for binary data, usually logistic regression, to all the pseudo-observations.

I don't see people adding random effects or specifying a correlation structure in discrete time survival models. What "tells" the model that the multiple rows (pseudo-observations) for each person are not independent? Isn't the model treating pseudo observations as uncorrelated when they should have some correlation?

Best Answer

As Richard Hardy notes, discrete-time survival models can incorporate random effects. In many typical use cases, however, they aren't necessary. This is a similar issue for counting-process data in both continuous-time and discrete-time survival models.

If an individual can have at most one event and the modeling only involves covariate values in place at event times (as in Cox models or in many discrete-time models), then there is no intra-individual correlation to contend with. Therneau and Grambsch discuss this aspect of counting-process data in Section 3.7.1, page 70:

One concern that often arises is that observations [on the same individual] are "correlated," and would thus not be handled by standard methods. This is not actually an issue. The internal computations for a Cox model have a term for each unique death or event time; a given term involves sums over those observations that are available or "at risk" at the select event date. Since the intervals for a particular subject, "Jones" say, do not overlap (assuming of course that Jones does not have a time machine, and could meet hirnself on the street), any given internal sum will involve at most one of the observations that represent Mr. Jones; that is, the sum will still be over a set of independent observations. For time-dependent covariates, the use of (start, stop] intervals is just a mechanism, a trick almost, that allows the program to select the correct x values for Jones at a given time. (When there are multiple events per subject the situation does become more complicated...).

If an individual can have more than 1 event or for types of parametric modeling that involve the entire survival function over time, then you would need to deal with intra-individual correlations. But simple binomial modeling when there can be at most 1 event per individual doesn't require that.