It seems that time = 0
for each individual is the birth date, so that the event time is age at the event. You don't seem to have time-varying covariates. With at most 1 event per individual, you don't need to take repeated events into account, although you do need to take correlations among members of the same family into account. In those respects, this is a straightforward survival model.
Left censoring
The practical problem with a Cox model here is that the times to events prior to 2009 are left censored. It's probably best to call them "left censored," even though left censoring is a limiting case of interval censoring with a lower limit of $t=-\infty$. I think that the term "interval censored" more typically is used in situations where individuals are followed up at intervals (e.g., regular visits after cancer treatment) and the event time is known to be at some time between 2 follow-up visits. For a standard Cox model based on risk sets at event times, you don't know which risk sets should include the left-censored cases so you need another approach to proportional-hazards modeling.
Family clusters
In terms of the clustering, you say:
Events are observed in strata (children in the same family), and the goal is to estimate the effect of within-strata variation of an exposure on occurrence of events with a fixed-effects model. To implement a fixed-effects cox model I need to use stratified baseline hazards.
There's a big practical problem in stratifying with separate baselines for each family. That will provide only a few cases within each stratum, leading to difficulty in identifying family-specific baseline hazards and a big loss of power overall.
You might be better off with an unstratified model, using some other way to take within-family correlations into account. Standard approaches are to estimate robust standard errors around the coefficient point estimates (cluster()
term in the survival
package) or to include a frailty/random-effect term in the model for families.
Parametric versus semi-parametric models
Left-censored data are reasonably easy to handle with parametric models, as the contribution of a case left-censored at time $t_c$ to the likelihood is simply the cumulative distribution through $t_c$, $F(t_c) = 1 - S(t_c)$, the complement of the survival function through that time. With a semi-parametric model and left/interval censoring, you instead need to estimate a baseline hazard (via the Turnbull extension to Kaplan-Meier curves for interval-censored data) jointly with the regression coefficients. That computationally intensive task furthermore requires bootstrapping to get confidence intervals on the coefficient estimates. See the icenReg()
documentation.
Although I haven't tried this myself, the icenReg
package seems capable of handling a semi-parametric proportional-hazards model along with clustering to account for within-family correlations. You would use id
values to represent families rather than individuals. The package has an ir_clustBoot()
function that seems to do the bootstrapping equivalent of the cluster()
adjustment in survival
package models, evidently with cluster-based bootstrapping instead of the default case-based bootstrapping. To test that, you could try a small test data set with only right censoring and compare the results with icenReg
functions against what you get with the survival
package and cluster()
terms.
It might be simplest to use the standard survreg()
function in the survival
package to try parametric models. Those models directly handle clustered data with a "cluster" argument provided to the function. Although survreg()
also will accept a frailty term instead, the main documentation (Section 5.5.3) indicates that frailty terms might not behave properly outside of the Cox regression context.
Best Answer
I don't know if this is going to help much, but economists sometimes think about data like these as discrete time event history data, and fit (strings of) logistic regression models to them. See e.g. http://www.ats.ucla.edu/stat/stata/library/survival2.htm. And apparently you can add random effects to that to control for correlation within the same subject: http://www.jstor.org/stable/3068299.