Time Dependent Coefficients – How to Set Time Dependent Coefficients in R

cox-modelrregressionsurvival

Update: Sorry for another update but I've found some possible solutions with fractional polynomials and the competing risk-package that I need some help with.


The problem

I can't find an easy way to do a time dependent coefficient analysis is in R. I want to be able to take my variables coefficient and do it into a time dependent coefficient (not variable) and then plot the variation against time:

$\beta_{my\_variable}=\beta_0+\beta_1*t+\beta_2*t^2…$

Possible solutions

1) Splitting the dataset

I've looked at this example (Se part 2 of the lab session) but the creation of a separate dataset seems complicated, computationally costly and not very intuitive…

2) Reduced Rank models – The coxvc package

The coxvc package provides an elegant way of dealing with the problem – here's a manual. The problem is that the author is no longer developing the package (last version is since 05/23/2007), after some e-mail conversation I've gotten the package to work but one run took 5 hours on my dataset (140 000 entries) and gives extreme estimates at the end of the period. You can find a slightly updated package here – I've mostly just updated the plot function.

It might be just a question of tweaking but since the software doesn't easily provide confidence intervals and the process is so time consuming I'm looking right now at other solutions.

3) The timereg package

The impressive timereg package also addresses the problem but I'm not certain of how to use it and it doesn't give me a smooth plot.

4) Fractional Polynomial Time (FPT) model

I found Anika Buchholz' excellent dissertation on "Assessment of time–varying long–term effects of therapies and prognostic factors" that does an excellent job covering different models. She concludes that Sauerbrei et al's proposed FPT seems to be the most appropriate for time-dependent coefficients:

FPT is very good at detecting time-varying effects, while the Reduced Rank approach results in far too complex models, as it does not include selection of time-varying effects.

The research seems very complete but it's slightly out of reach for me. I'm also a little wondering since she happens to work with Sauerbrei. It seems sound though and I guess the analysis could be done with the mfp package but I'm not sure how.

5) The cmprsk package

I've been thinking of doing my competing risk analysis but the calculations have been to time-consuming so I switched to the regular cox regression. The crr has thoug an option for time dependent covariates:

....
cov2        matrix of covariates that will be multiplied 
            by functions of time; if used, often these 
            covariates would also appear in cov1 to give 
            a prop hazards effect plus a time interaction
....

There is the quadratic example but I'm don't quite follow where the time actually appears and I'm not sure of how to display it. I've also looked at the test.R file but the example there is basically the same…

My example code

Here's an example that I use to test the different possibilities

library("survival")
library("timereg")
data(sTRACE)

# Basic cox regression    
surv <- with(sTRACE, Surv(time/365,status==9))
fit1 <- coxph(surv~age+sex+diabetes+chf+vf, data=sTRACE)
check <- cox.zph(fit1)
print(check)
plot(check, resid=F)
# vf seems to be the most time varying

######################################
# Do the analysis with the code from #
# the example that I've found        #
######################################

# Split the dataset according to the splitSurv() from prof. Wesley O. Johnson
# http://anson.ucdavis.edu/~johnson/st222/lab8/splitSurv.ssc
new_split_dataset = splitSuv(sTRACE$time/365, sTRACE$status==9, sTRACE[, grep("(age|sex|diabetes|chf|vf)", names(sTRACE))])

surv2 <- with(new_split_dataset, Surv(start, stop, event))
fit2 <- coxph(surv2~age+sex+diabetes+chf+I(pspline(stop)*vf), data=new_split_dataset)
print(fit2)

######################################
# Do the analysis by just straifying #
######################################
fit3 <- coxph(surv~age+sex+diabetes+chf+strata(vf), data=sTRACE)
print(fit3)

# High computational cost!
# The price for 259 events
sum((sTRACE$status==9)*1)
# ~240 times larger dataset!
NROW(new_split_dataset)/NROW(sTRACE)

########################################
# Do the analysis with the coxvc and   #
# the timecox from the timereg library #
########################################
Ft_1 <- cbind(rep(1,nrow(sTRACE)),bs(sTRACE$time/365,df=3))
fit_coxvc1 <- coxvc(surv~vf+sex, Ft_1, rank=2, data=sTRACE)

fit_coxvc2 <- coxvc(surv~vf+sex, Ft_1, rank=1, data=sTRACE)

Ft_3 <- cbind(rep(1,nrow(sTRACE)),bs(sTRACE$time/365,df=5))
fit_coxvc3 <- coxvc(surv~vf+sex, Ft_3, rank=2, data=sTRACE)

layout(matrix(1:3, ncol=1))
my_plotcoxvc <- function(fit, fun="effects"){
    plotcoxvc(fit,fun=fun,xlab='time in years', ylim=c(-1,1), legend_x=.010)
    abline(0,0, lty=2, col=rgb(.5,.5,.5,.5))
    title(paste("B-spline =", NCOL(fit$Ftime)-1, "df and rank =", fit$rank))
}
my_plotcoxvc(fit_coxvc1)
my_plotcoxvc(fit_coxvc2)
my_plotcoxvc(fit_coxvc3)

# Next group
my_plotcoxvc(fit_coxvc1)

fit_timecox1<-timecox(surv~sex + vf, data=sTRACE)
plot(fit_timecox1, xlab="time in years", specific.comps=c(2,3))

The code results in these graphs: Comparison of different settings for coxvc and of the
coxvc and the timecox plots. I guess the results are ok but I don't think I'll be able to explain the timecox graph – it seems to complex…

My (current) questions

  • How do I do the FPT analysis in R?
  • How do I use the time covariate in cmprsk?
  • How do I plot the result (preferably with confidence intervals)?

Best Answer

@mpiktas came close in offering a feasible model, however the term that needs to be used for the quadratic in time=t would be I(t^2)) . This is so because in R the formula interpretation of "^" creates interactions and does not perform exponentiation, so the interaction of "t" with "t" is just "t". (Shouldn't this be migrated to SO with an [r] tag?)

For alternatives to this process, which looks to me somewhat dubious for inference purposes, and one which probably fits your interest in using the supportive functions in Harrell's rms/Hmisc packages, see Harrell's "Regression Modeling Strategies". He mentions (but only in passing although he does cite some of his own papers) constructing spline fits to the time scale to model bathtub-shaped hazards. His chapter on parametric survival models describes a variety of plotting techniques for checking proportional hazards assumptions and for examining the linearity of estimated log-hazard effects on the time scale.

Edit: An additional option is to use coxph's 'tt' parameter described as an "optional list of time-transform functions."

Related Question