Solved – Change-point in Cox survival model

change pointcox-modelsurvival

I am looking to fit a Cox proportional hazard survival model. Looking at the K-M curve (below) for one variable (with 2 categories) it appears there is a change in hazard ratios at around day 110. I was thinking of modeling this with a change-point model.

KM-Curve

I'm having trouble implementing it. I have defined days_ind as 1 if days>=110 and 0 otherwise. Then I run the model:

coxph(Surv(time=days,event=event2)~x*days_ind)

I get several warning messages about convergence and the results don't seem to make any sense.

Am I approaching this in the correct way? I thought of bringing in the interaction of x*days instead but this too does not converge and also leads to strange estimates.

Best Answer

You need to make days_ind a time-dependent variable. The way you have it coded right now, everybody whose observation (whether event or censoring time) was after 110 days will have experienced a different hazard throughout their entire followup then those whose observation is before 110 days. What you want to have is for the hazard to "jump" at 110 days.

It is not completely straightforward to set up this analysis in the survival package. You have to split the follow-up period of each person into two periods: up to 110 days and after that. Anybody surviving beyond 110 days would have two observations: one right-censored at 110, and the other left-truncated at 110 and having the actual event on the right side. Fortunately, there is a function to do exactly that: survSplit.

Here is a quick example with a built-in dataset:

> library(survival)
> aml$id <- 1:nrow(aml)  # add a subject ID variable
> aml2 <- survSplit(aml,cut=10,end="time",start="start", event="status", episode="period")
> 
> subset(aml, id<=3)
  time status          x id
1    9      1 Maintained  1
2   13      1 Maintained  2
3   13      0 Maintained  3
> subset(aml2, id<=3)
   time status          x id start period
1     9      1 Maintained  1     0      0
2    10      0 Maintained  2     0      0
3    10      0 Maintained  3     0      0
25   13      1 Maintained  2    10      1
26   13      0 Maintained  3    10      1

You can see that there are now two observations for id's 2 and 3. The period variable corresponds to your days_ind.

From here you can build the model you want, but you have to code the effects carefully, because the effect of period cannot be estimated, since this it refers to different times.

> fit <- coxph(Surv(start, time, status) ~ 
   I((x=="Maintained")&(period==0)) + I((x=="Maintained")&(period==1)), data=aml2)
> fit
Call:
coxph(formula = Surv(start, time, status) ~ I((x == "Maintained") & 
    (period == 0)) + I((x == "Maintained") & (period == 1)), 
    data = aml2)


                                             coef exp(coef) se(coef)     z    p
I((x == "Maintained") & (period == 0))TRUE -1.498     0.224    1.120 -1.34 0.18
I((x == "Maintained") & (period == 1))TRUE -0.722     0.486    0.591 -1.22 0.22

Likelihood ratio test=3.79  on 2 df, p=0.150  n= 41, number of events= 18

Here the two coefficients measure the effect of Maintained vs Non-maintained before 10 days and after 10 days, respectively.

You could also consider using the cmprsk package. It is designed for analysis of competing risks, but there is nothing stopping you from using it for only one outcome. The benefit is that it has an easier way of defining time-dependent covariates (though a really awkward syntax overall):

> library(cmprsk)
> fit1 <- with(aml, crr(time, status, cov1=I(x=="Maintained"), cov2=I(x=="Maintained"), 
+                      tf=function(t)I(t<=10)))
> fit1
convergence:  TRUE 
coefficients:
    I(x == "Maintained")1 I(x == "Maintained")1*tf1 
                  -0.7213                   -0.7387 
standard errors:
[1] 0.5259 1.1500
two-sided p-values:
    I(x == "Maintained")1 I(x == "Maintained")1*tf1 
                     0.17                      0.52

Note that with the different coding, the meaning of the coefficients is not exactly the same as above.

Related Solutions

Solved – High concordance in cox PH model even though PH assumption is violated

In reverse order:

3) The concordance is simply the proportion of pairs of cases in which the case with the higher-risk predictor had an event before the case with the lower-risk predictor. With a single numeric predictor, the concordance will be the same for any monotone transformation of the predictor even though the Cox model fits may be substantially different. Crudely put, concordance shows your ability to predict who of a pair will die sooner, but not necessarily how much sooner or what proportion of the variance of event times is explained by the model.

Concordance for a multivariate model uses the combined linear predictor from the Cox regression as the numeric predictor for each case. So if variables with non-proportional hazards have small-magnitude coefficients compared with other variables, or if their relations to outcome are strong enough despite non-proportionality, the rankings of combined linear predictors may be well correlated with the rankings of event times--which is all that concordance tells you.

2) Absent the PH assumption, HRs aren't strictly valid and can be highly misleading. Think about the corresponding case of a linear-regression fit of data that are not linearly related.

1) Main consequence is that you should examine variables that don't meet the PH assumption in more detail. Consider stratifying by those variables, or devising time-dependent models.

Solved – Survival analysis / cox-regression of periodically recurring events

Looks like you have a predictor that varies cyclically. My impression from very limited reading of the botanical literature is that something along the lines of cumsum(degree_days) (where the sums are calculated from a point in the middle of winter and a "degree-day" is the number of degrees above a threshold temperature) is used to predict budding and perhaps flowering. So if that were the case then you would first process the temperatures so they were cumulative in whatever sense was appropriate to your specialized domain and then use that as the time-dependent variable. The problem with the multiple years should be handles as a strata(year) entered using the formula. See ?coxph for examples of each of those. Perhaps:

dat$cumtemp <- with(dat, ave( (temp > crit_temp)*(temp-crit_temp), year, cumsum) )

surv_cox = coxph( Surv( tstart, tstop, flowering) ~ cumtemp + strata(year))
summary(surv_cox)

Best Answer

Related Solutions

Solved – High concordance in cox PH model even though PH assumption is violated

Solved – Survival analysis / cox-regression of periodically recurring events

Related Question