You can do this with static_glm
function in the dynamichazard
package I have made. The model you get is exactly like the multiperiod logit model used in
Shumway, T. (2001). Forecasting bankruptcy more accurately: A simple hazard model. The Journal of Business, 74(1), 101-124.
This is common method used in the litterature. The R code for your data would be
fit <- dynamichazard::static_glm(
formula = Surv(tstart, tstop, EVENT) ~ x1 + x2 + x3 + x4 + x5,
data = the_data_frame_you_used, # you have to change this
max_T = 12, # the last time you observe
by = 1) # bin into period of one year
You will though first have to transform you data into the start-stop setup. This is easily done with the tmerge
function from the survival
package. You can see the Using Time Dependent Covariates and Time Dependent
Coefficients in the Cox Model vignette in the survival
package for example on how to use the tmerge
function.
Of course, you can use any other survival method which supports time-varying covariates once you have your data.frame
in the stat-stop format. There is a long list of option in R. E.g., see the Survival Analysis view.
An issue though is that you companies (likely) do not default at TIME
but somewhere between Time -1
and TIME
. I.e. you are dealing with interval censoring which you may want to account for if you chose with the survival model you use.
You question is related to this one. Particularly, you can include TIME
as random effect kinda like answer here as follows
require(lme4)
ans <- glmer(
EVENT ~ x1 + x2 + x3 + x4 + x5 + (1|TIME),
data = your_intial_data_frame, # data.frame as you posted it
family = binomial)
Update to OP's further questions
Could you please let me know if it is possible to use "cloglog" for your both methods?
You cannot get an interval censored model (i.e., cloglog link function) with the static_glm
. However, you can use the get_survival_case_weights_and_data
function in the same package as I show in the Comparing methods for time varying logistic models vignette and then use whatever classifier you want like glm
with a cloglog link function.
Is it allowed to use your suggestions If some companies enter the
study in time 4, some others in time 7 and etc.?
This is called delayed entry. It should not be problem in a discrete time default model if your time scale is the calendar date/year.
Really, I want to predict bankruptcy using survival analysis so my
covariates should be lagged for example 1 year lag.
Yes, you need to lag your covariates.
As I tried logistic regression in Python - sklearn, the solver "sag"
had a better performance. Is it allowed to use this solver in your
suggestions? Thanks a lot.
Seems like "sag" is a penalized logistic model. It should not be problem if you set up your data correctly.
Best Answer
With only 5 time points and at most one event per individual, a discrete-time model would be the most natural. Interval-censored Cox regression is possible, but that's probably better reserved for situations where the time intervals differ among individuals. When the time intervals are the same for everyone, using binomial regression with a complementary log-log link provides a time-grouped proportional hazards model that's what you would get from an interval-censored Cox model. See this page. Other links in the binomial regression are possible, but wouldn't have the same proportional hazards interpretation.
The different lengths of the time periods don't really matter, unless you explicitly model time as other than categorical in the binomial discrete-time survival model. A Cox model per se doesn't directly evaluate event times at all. It only uses the order of events in time. The survival curves you can generate from a Cox model simply re-express the ordered events in terms of the times at which they occurred.