Solved – Extended Cox model with continuous time dependent covariate – how to structure data

cox-modelrsurvivaltime-varying-covariate

I need to run an extended Cox model with a time-varying covariate in R: let’s call it the "number of doses" (X). I am interested in the hazard ratio associated with each level of X, ie. how an additional dose affects the likelihood of recovery.

X is a discrete, positive integer. It can either remain constant or increase by 1 during each observation time.
status=1 represents recovery and d1,d2,…d6 are the number of doses received (created for modeling purposes)

I am not sure how to set up the data. I’ve come up with two alternatives based on my reading and am attaching an excerpt of the data set-up and the code for each method, however I’m not sure which (if either) is correct for answering this question? They give fairly different results.

The data has been artificially parsed at each observation time (stop):

    patient     start      stop doses d1 d2 d3 d4 d5 d6 status
          1  0.000000  0.034000     1  1  0  0  0  0  0      0
          1  0.034000 12.949399     2  1  1  0  0  0  0      0
          1 12.949399 13.813242     3  1  1  1  0  0  0      0
          1 13.813242 30.835070     4  1  1  1  1  0  0      0
          2  0.000000  0.240000     1  1  0  0  0  0  0      0
          2  0.240000  2.984179     2  1  1  0  0  0  0      0
          2  2.984179  4.014723     3  1  1  1  0  0  0      0
          2  4.014723  5.186506     4  1  1  1  1  0  0      0
          2 20.869955 29.832999     4  1  1  1  1  0  0      0
          2 29.832999 32.063887     5  1  1  1  1  1  0      0
          2 32.063887 37.924743     6  1  1  1  1  1  1      1

METHOD 1: treat the number of doses as a factor

    dt<-read.csv('data.csv',header=T)
    surv<-Surv(start,stop,status,data=dt)
    m<-coxph(surv ~ factor(doses),data=dt)

METHOD 2: treat each dose as a binary variable

    dt<-read.csv('data.csv',header=T)
    surv<-Surv(start,stop,status,data=dt)
    m<-coxph(surv ~ d1+d2+d3+d4+d5+d6, data=dt)

Does either method take into account that a patient had (n-1) doses in the previous period?

Best Answer

Patients are "taken into account" for their dosage history in 2 distinct ways:

The first is the coding of dose receipt as a time varying covariate. A brief explanation of how they are included in the risk sets for the comparisons in the Cox model: Patient 1 was observed in the risk-set denominators from 0 until 0.034 at which point they were censored because they did not die and their covariates changed. They were then re-entered into the study since they received a second dose at 0.034 and were observed until 12.95 time-units with two cumulative doses at which point they were censored again. They received their 4th dose and were censored for loss-to-follow-up or administratively at time 30.8. The second patient follows the same pattern but experiences the event at time 37.92 with six doses. Since the first patient was censored before then, they are not compared in terms of the study outcome. This is how we account for the amount of time, and the order of dose receipt, in which the patient was at risk for disease/death during the study.

The second way that "dose at n-1" is taken into account is the coding of cumulative dose. The two modeling approaches you propose in fact result in similar predictions and inference. The second formulation, one which adjusts for the "covariate history", is called a "distributed lag". The factor levels are merely the sum of the lags because the time varying covariates follow a specific order (a patient doesn't for instance miss the first dose and then receive their second). You will find, however, that the p-values and global inference from the two formulations are the same.

A hazard to venture here: I don't think this model answers the type of question you believe you are answering. You say: " I am interested in the hazard ratio associated with each level of X, ie. how an additional dose affects the likelihood of recovery." It sounds to me like you should code the dosage variable numerically so that if a patient has 5 doses and dies you know they have had more exposure to the drug than one who has had 4 doses; both formulations you propose make it possible for the cumulative dose effect to vary in unpredictable ways.

Another issue to discuss is the fact that there may not be enough variability to differentiate the person-time-at-risk effect from the effect of cumulative dose. It is assumed that the longer the person is in the study, the more at-risk they generally are for an outcome. This is an issue that is solved by having a lot of data and a lot of variation in the timing of the subsequent doses. Using some parametric models like a log-linear model for the life table may help assess the sensitivity to that.

Related Question