Solved – Discrete time hazard models (cloglog) in R

rsurvival

The survival package in R appears to focus on continuous time survival models. I am interested in estimating a discrete time version of a proportional hazard model, the complementary log-log model. I have a fairly straightforward survival model, with simple right censoring.

I know that one way to estimate this model is to create a data set that has a separate row for each observation for each period in which it isn't "dead." Then, a glm model with the cloglog link can be used.

This approach seems very memory inefficient; indeed, it would likely produce a data set that is too large for the memory on my machine.

A second approach would be to code up the MLE myself. That would be simple enough, but I am hoping that there is a package that has this survival model canned. It would just be easier for collaboration and to avoid coding errors to use a package.

Does anyone know of such a package?

Best Answer

Having several rows for each observation may seem redundant, but, likely, it's not. If there are any time-varying covariates in the model, then each observation-month will certainly need its own row. One particular example of a time-varying covariate is the elapsed time. Since this variable should almost certainly be included in the model, it makes sense to have a separate row for each observation-period. Thus, the first approach suggested is likely the best one.

Note that this is different from a continuous time proportional hazards model with a Weibull distribution. There, the survival model can be simplified to a single line for each observation if time elapsed is the only time-varying covariate (see here, for example). A similar result holds for the Cox proportional hazard model.

Related Question