Solved – Extended Cox model with continuous time dependent covariate – how to structure data

cox-modelrsurvivaltime-varying-covariate

I need to run an extended Cox model with a time-varying covariate in R: let’s call it the "number of doses" (X). I am interested in the hazard ratio associated with each level of X, ie. how an additional dose affects the likelihood of recovery.

X is a discrete, positive integer. It can either remain constant or increase by 1 during each observation time.
status=1 represents recovery and d1,d2,…d6 are the number of doses received (created for modeling purposes)

I am not sure how to set up the data. I’ve come up with two alternatives based on my reading and am attaching an excerpt of the data set-up and the code for each method, however I’m not sure which (if either) is correct for answering this question? They give fairly different results.

The data has been artificially parsed at each observation time (stop):

    patient     start      stop doses d1 d2 d3 d4 d5 d6 status
          1  0.000000  0.034000     1  1  0  0  0  0  0      0
          1  0.034000 12.949399     2  1  1  0  0  0  0      0
          1 12.949399 13.813242     3  1  1  1  0  0  0      0
          1 13.813242 30.835070     4  1  1  1  1  0  0      0
          2  0.000000  0.240000     1  1  0  0  0  0  0      0
          2  0.240000  2.984179     2  1  1  0  0  0  0      0
          2  2.984179  4.014723     3  1  1  1  0  0  0      0
          2  4.014723  5.186506     4  1  1  1  1  0  0      0
          2 20.869955 29.832999     4  1  1  1  1  0  0      0
          2 29.832999 32.063887     5  1  1  1  1  1  0      0
          2 32.063887 37.924743     6  1  1  1  1  1  1      1

METHOD 1: treat the number of doses as a factor

    dt<-read.csv('data.csv',header=T)
    surv<-Surv(start,stop,status,data=dt)
    m<-coxph(surv ~ factor(doses),data=dt)

METHOD 2: treat each dose as a binary variable

    dt<-read.csv('data.csv',header=T)
    surv<-Surv(start,stop,status,data=dt)
    m<-coxph(surv ~ d1+d2+d3+d4+d5+d6, data=dt)

Does either method take into account that a patient had (n-1) doses in the previous period?

Best Answer

Patients are "taken into account" for their dosage history in 2 distinct ways:

The first is the coding of dose receipt as a time varying covariate. A brief explanation of how they are included in the risk sets for the comparisons in the Cox model: Patient 1 was observed in the risk-set denominators from 0 until 0.034 at which point they were censored because they did not die and their covariates changed. They were then re-entered into the study since they received a second dose at 0.034 and were observed until 12.95 time-units with two cumulative doses at which point they were censored again. They received their 4th dose and were censored for loss-to-follow-up or administratively at time 30.8. The second patient follows the same pattern but experiences the event at time 37.92 with six doses. Since the first patient was censored before then, they are not compared in terms of the study outcome. This is how we account for the amount of time, and the order of dose receipt, in which the patient was at risk for disease/death during the study.

The second way that "dose at n-1" is taken into account is the coding of cumulative dose. The two modeling approaches you propose in fact result in similar predictions and inference. The second formulation, one which adjusts for the "covariate history", is called a "distributed lag". The factor levels are merely the sum of the lags because the time varying covariates follow a specific order (a patient doesn't for instance miss the first dose and then receive their second). You will find, however, that the p-values and global inference from the two formulations are the same.

A hazard to venture here: I don't think this model answers the type of question you believe you are answering. You say: " I am interested in the hazard ratio associated with each level of X, ie. how an additional dose affects the likelihood of recovery." It sounds to me like you should code the dosage variable numerically so that if a patient has 5 doses and dies you know they have had more exposure to the drug than one who has had 4 doses; both formulations you propose make it possible for the cumulative dose effect to vary in unpredictable ways.

Another issue to discuss is the fact that there may not be enough variability to differentiate the person-time-at-risk effect from the effect of cumulative dose. It is assumed that the longer the person is in the study, the more at-risk they generally are for an outcome. This is an issue that is solved by having a lot of data and a lot of variation in the timing of the subsequent doses. Using some parametric models like a log-linear model for the life table may help assess the sensitivity to that.

Related Solutions

Solved – Structure of data and function call for recurrent event data with time-dependent variables

Your data formatting are correct.

You have multiple records per-patient due to recurrent events and the added complexity of the drug being a time varying covariate. The output you printed using head is helpful for understanding these data.

The typical approach to analyzing recurrent events as well as time varying covariates, is formatting the data to be in a "long" format where each row represents an interval of risk-covariate observations. For instance, we see patient 123 is on Drug1 alone from time 0 to time 2, then changes to take both Drug 1 and Drug 2 from time 3. At that point, they had not experienced a fall, so their observation from 0-2 is censored at that point because we do not know how much longer their fall would come if they continued to take Drug 1 alone. At time 3 they are re-entered into the cohort coded as a patient taking both drugs for 7 time-units after which they experience their first fall. They experience a second fall on the same Drug combination only 4 time-units after.

The number of records is not a useful summary of cohort data. It is not surprising the number of rows is far larger than the number of patients. Instead, sum the times from start-to-stop and record it as an amount of person-time-at-risk. The cohort-denominator is useful for understanding incidence. It is useful also to summarize the raw number of patients, but bear in mind the data are in "long" format so that is less than the number of rows in your dataset.

For the error, I think you may need to add 1 unit to the "stop" date. If patient 123 takes drug 1 for days 0, 1, and 2 and then starts drug 2 on day 3, then they experienced 3 days at-risk for falls on drug 1. However, 2-0 = 2 and that is not the correct denominator.

What the "cluster" argument does (typically) is impose a frailty, which is a type of random intercept that accounts for what may be proportional risk differences attributable to several unmeasured risk factors. I do not often conduct analyses with frailties. You can omit the "cluster" command and interpret the outcomes as incidence ratios. You can alternately fit the cox model for the time until the first fall in all patients and interpret the hazard ratios as risk ratios. I think the frailty result should fall somewhere between these two, and I've never quite been clear what the interpretation should be.

Solved – Get standard error of exponentiated coefficient in cox regression

You can do it manually by calculating $se(sex) \cdot \exp(sex)=0.1672 \cdot 0.5880=.0983136$ since the derivative of $\exp{}$ is $\exp{}$ itself, or like this using svycontrast() in the survey package:

library("survival")
library("survey")
data("lung") #From the survival package
res.cox <- coxph(Surv(time, status) ~ sex, data = lung)
summary(res.cox)
svycontrast(res.cox, quote(exp(sex)))

which yields

         nlcon     SE
contrast 0.588 0.0983

Best Answer

Related Solutions

Solved – Structure of data and function call for recurrent event data with time-dependent variables

Solved – Get standard error of exponentiated coefficient in cox regression

Related Question