The survival
package in R
appears to focus on continuous time survival models. I am interested in estimating a discrete time version of a proportional hazard model, the complementary log-log model. I have a fairly straightforward survival model, with simple right censoring.
I know that one way to estimate this model is to create a data set that has a separate row for each observation for each period in which it isn't "dead." Then, a glm
model with the cloglog
link can be used.
This approach seems very memory inefficient; indeed, it would likely produce a data set that is too large for the memory on my machine.
A second approach would be to code up the MLE myself. That would be simple enough, but I am hoping that there is a package that has this survival model canned. It would just be easier for collaboration and to avoid coding errors to use a package.
Does anyone know of such a package?
Best Answer
Having several rows for each observation may seem redundant, but, likely, it's not. If there are any time-varying covariates in the model, then each observation-month will certainly need its own row. One particular example of a time-varying covariate is the elapsed time. Since this variable should almost certainly be included in the model, it makes sense to have a separate row for each observation-period. Thus, the first approach suggested is likely the best one.
Note that this is different from a continuous time proportional hazards model with a Weibull distribution. There, the survival model can be simplified to a single line for each observation if time elapsed is the only time-varying covariate (see here, for example). A similar result holds for the Cox proportional hazard model.