Solved – Cox regression with time varying covariates

cox-modelregressionsurvival

I am looking for resources on Cox proportional hazards model with time varying covariates. I'm new to survival analysis so I'm looking for something not overly mathematical. I would also appreciate any information about software implementations that can deal with this problem.

Best Answer

I might be able to give you some tips:

Recommended books...

Kleinbaum, Klein: Survival analysis - A Self-Learning text http://www.springer.com/statistics/life+sciences,+medicine+%26+health/book/978-1-4419-6645-2 In my opinion the best book on this matter and it includes time-varying covariates and also how to program the computations in SAS and R.

Also look at: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf Which is John Fox explanation and he uses R to calculate, which is great.

Second: what type of time-varying covariates do you have? - Multiple observations per individual? - Multiple endpoints per individual?

In general, if you have 1 endpoint of interest and multiple observations per individual, you usually set up the data frame in a format which means that each observation corresponds to one row (therefore one individual may have several rows of data) and you create a start variable and a stop variable, which is simply the start and stop intervals for each observation.

The usual Cox model:

coxph(Surv(survival, event) ~ predictors, data = df)

The time-dependent Cox model (if data is set up as described above):

coxph(Surv(star, stopp, event) ~ predictors, data = df)

A very well written manual can be found here: http://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf

Related Solutions

Solved – Can Cox models be used with time-varying covariates

First of - Yes, an extended Cox model can handle time dependent covariates (and coefficients) with ease, and with no change to the underlying model. Mind you that the data set needs to reflect this time dependency. For example:

subject time1 time2 event  var1  var2
   1      0     15    0     25    1.3
   1     15     46    0     25    1.5
   1     46     73    0     25    1.4
   1     73    100    1     25    1.6

var2 is a time dependent covariate.

coxph(Surv(time1, time2, event) ~ var1 + var2, data = data)

        coef exp(coef) se(coef)    z  p
var1  -1.072  0.342     0.262  -3.44  0.00058
var2  -0.0574 0.944     0.022  -2.61  0.009

So in this made up example, var2 can be said that holding var1 constant each additional unit of var2 reduces the weekly hazard ratio by 5.6% on average.

The interpretation seems similar to non-time varying covariates, yet there is an underlying difference - that is that the time is taken into account. In a dummy variable that is time dependent it is simpler to understand - The time while the variable is not existent (0) is taken into account when calculating the effect on the hazard ratio.

You can see Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model by Therneau (creator of the survival package) for further explanations.

Cox Model – How to Analyze Cox Survival Data with Time-Varying Continuous Variables and Fixed Covariates in R

I figure you are looking for the model in equation 4 in this article. Then this code will do

> library(survival)
> set.seed(747)
> 
> N <- 1e4 # number of subjects
> k <- 0.3 # assuming the time varying covariates is proporitonal to time x2=k*t
> lambda <- 0.01
> betat <- 0.3 
> beta <- -0.6
> rateC <- 0.01
> 
> #####
> # simulate 
> x1 <- as.numeric(runif(N) > .5) # covariate
> 
> # simulate outcome (NB: OP forgot parentheses)
> u <- runif(n = N)
> Tlat <- log(1 + betat*k*(-log(u))/(lambda*exp(beta*x1))) / (betat * k)
> 
> # censoring and status indicator
> Cen<- rexp(n = N, rate = rateC)
> 
> # make data.frame
> df <- data.frame(
+   time = pmin(Tlat, Cen), 
+   status = Tlat <= Cen, 
+   id = 1:N, 
+   x1 = x1, 
+   x2 = rep(k, N))
> 
> # fit model -- cannot use coxph for x2 due to the non-parametric intercept
> fit <- coxph(Surv(time, status) ~ x1, data = df)
> fit # coefficient for x1 is right
Call:
coxph(formula = Surv(time, status) ~ x1, data = df)

      coef exp(coef) se(coef)   z      p
x1 -0.6083    0.5442   0.0234 -26 <2e-16

Likelihood ratio test=675  on 1 df, p=0
n= 10000, number of events= 7891 
> 
> # # won't work as we try to estimate a term that is equal for all individuals 
> # # at each time t
> # fit <- coxph(Surv(time, status) ~ x1 + tt(x2), data = df, 
> #              tt = function(x, t, ...) x * t)
> 
> # The non-parametric cumulative hazard though match with what we expect
> base <- basehaz(fit, centered = FALSE)
> plot(base$time, base$hazard, type = "l")
> lines(
+   base$time, 
+   lambda / (betat * k) * (exp(betat * k * base$time) - 1), col = "red")

> #####
> # estimate parametric model
> neg_loglike_func <- function(b){
+   # compute intermediates
+   lp1 <- exp(b[1] + b[2] * df$x1)    # time-invariant terms
+   lp2 <- exp(b[3] * df$x2 * df$time) # time-varying terms
+   log_haz <- log(lp1) + log(lp2)                  # instant hazard
+   cumhaz <- lp1 * (lp2 - 1)/(b[3] * df$x2 + 1e-8) # cumhaz
+   
+   # compute log likelihood terms
+   ll_terms <- ifelse(df$status == 1,  log_haz - cumhaz, -cumhaz)
+   -sum(ll_terms) # neg log likelihood
+ }
> 
> # fit model
> b <- c(0.01, -0.02, 0.01) # starting valus
> fit <- optim(b, neg_loglike_func)
> 
> exp(fit$par[1]) # lambda estimate
[1] 0.01035017
> fit$par[2:3]    # beta and betat estimate
[1] -0.6099467  0.2966224

You may be able to avoid the optim function above by e.g., using one of the parametric survival models in the rstpm2 package or polspline.

Before edits

I cannot see how what you do is related to the cited article. Further, I am struggling to see how x2 enters into the model. Maybe post equations for the model you are trying to simulate from?

What may help you is that all of your observation end up dying at the end it seems which seems odd given that you have # censoring times in the Cen object. I show this below

library(survival)
set.seed(747)

N=1000 # number of subjects
k=0.3 # assuming the time varying covariates is proporitonal to time x2=k*t
lambda=0.01
betat=0.3 
beta=-0.6
rateC=0.01


#####
# Time fixed variable
x1 <- sample(x=c(0, 1), size=N, replace=TRUE, prob=c(0.5, 0.5))

#Exponential latent event times
u <- runif(n=N)
Tlat <- log(1+betat*k*(-log(u)) / (lambda*exp(beta*x1))) / betat*k
Cen<- rexp(n=N, rate=rateC) #censoring times

# follow-up times and event indicators
time <- pmin(Tlat, Cen)
status <- as.numeric(Tlat <= Cen)
# data set
df.tfixed<-data.frame(id=1:N, time=time, status=status, x1=x1)


#####
# time dependent continuous variable
ntp<-sample(1:6,N,replace=T) #number of follow up time points
mat<-matrix(ncol=3,nrow=1)
i=0
for(n in ntp){
  i=i+1
  ft<-runif(n,min=0,max=df.tfixed$time[i])
  ft<-sort(ft)
  seq<-rep(ft,each=2)
  seq<-c(0,seq,df.tfixed$time[i])
  matid<-cbind(matrix(seq,ncol=2,nrow=n+1,byrow=T),i)
  mat<-rbind(mat,matid)
}
df.td<-data.frame(mat[-1,])
colnames(df.td)<-c("start","stop","id")
df.td$x2<-k*df.td$start


#combine the two data frames 
df<-merge(df.td,df.tfixed,by="id")
df$status=0
df$status[cumsum(as.vector(table(df$id)))]<-1

fit <- coxph(Surv(start,stop, status) ~ x1+x2, data=df)
fit$coef
>         x1         x2 
> -0.6040089  1.8471387 

# All observations dies exaclty once -- no censoring
all(tapply(df$status == 1, df$id, sum) == 1)
> [1] TRUE

It comes down to this line df$status[cumsum(as.vector(table(df$id)))]<-1.

Best Answer

Related Solutions

Solved – Can Cox models be used with time-varying covariates

Cox Model – How to Analyze Cox Survival Data with Time-Varying Continuous Variables and Fixed Covariates in R

Before edits

Related Question