I am looking to fit a Cox proportional hazard survival model. Looking at the K-M curve (below) for one variable (with 2 categories) it appears there is a change in hazard ratios at around day 110. I was thinking of modeling this with a change-point model.
I'm having trouble implementing it. I have defined days_ind as 1 if days>=110 and 0 otherwise. Then I run the model:
coxph(Surv(time=days,event=event2)~x*days_ind)
I get several warning messages about convergence and the results don't seem to make any sense.
Am I approaching this in the correct way? I thought of bringing in the interaction of x*days instead but this too does not converge and also leads to strange estimates.
Best Answer
You need to make
days_ind
a time-dependent variable. The way you have it coded right now, everybody whose observation (whether event or censoring time) was after 110 days will have experienced a different hazard throughout their entire followup then those whose observation is before 110 days. What you want to have is for the hazard to "jump" at 110 days.It is not completely straightforward to set up this analysis in the
survival
package. You have to split the follow-up period of each person into two periods: up to 110 days and after that. Anybody surviving beyond 110 days would have two observations: one right-censored at 110, and the other left-truncated at 110 and having the actual event on the right side. Fortunately, there is a function to do exactly that:survSplit
.Here is a quick example with a built-in dataset:
You can see that there are now two observations for id's 2 and 3. The
period
variable corresponds to yourdays_ind
.From here you can build the model you want, but you have to code the effects carefully, because the effect of
period
cannot be estimated, since this it refers to different times.Here the two coefficients measure the effect of Maintained vs Non-maintained before 10 days and after 10 days, respectively.
You could also consider using the
cmprsk
package. It is designed for analysis of competing risks, but there is nothing stopping you from using it for only one outcome. The benefit is that it has an easier way of defining time-dependent covariates (though a really awkward syntax overall):Note that with the different coding, the meaning of the coefficients is not exactly the same as above.