Solved – Structure of data and function call for recurrent event data with time-dependent variables

cox-modelrsurvival

I'm attempting to estimate the effect of 2 drugs (drug1, drug2) on the likelihood of a patient falling (event). The patients can fall more than once and can be put on or taken off of the the drugs at any point.

My question is how the data should be structured with regard to the time period (days), specifically whether there needs to be overlap between the days. There are two reasons why I think my structure is wrong, the first being a seemingly incorrect N. I am also getting some errors where the time period is a single day (i.e. time1=4, time2=4) and am unsure how these should be coded. Should the start time of subsequent entries be the stop time of the previous entry? I've tried it both ways (with and without overlap), and while having overlap gets rid of the warning, the N is still incorrect.

Warning message:
In Surv(time = c(0, 2, 7, 15, 20, 0, 18, 27, 32, 35, 39, 46, 53,  :
  Stop time must be > start time, NA created

Right now I have the data set up where the beginning of the next entry is the next day. Unique patients are identified by their chart numbers.

Time1    Time2    Drug1    Drug2   Event    ChartNo
    0        2        1        0       0        123
    3       10        1        1       1        123
   11       14        1        1       1        123
    0       11        0        1       0        345
    0       19        1        0       1        678
    0        4        0        1       0        900
    5       18        1        1       0        900

Patient 123 was on drug1 at the start to day 2, after which point they had drug2 added. They went from day 3 to day 10 on both drugs before falling the first time, then fell a second time on day 14 while still on both drugs. Patient 345 went 11 days on drug2 without falling (then was censored), etc.

The actual estimation looks like this:

S <- Srv(time=time1, time2=time2, event=event)
cox.rms <- cph(S ~ Drug1 + Drug2 + cluster(ChartNo), surv=T)

My main concern is that the n for my analysis is reported to be 2017 (the number of rows in the data), when in actuality I only have 314 unique patients. I am unsure if this is normal or the result of some error I've made along the way.

> cox.rms$n
Status
No Event    Event 
    1884      133 

The same is true when using coxph() from the survival package.

 n= 2017, number of events= 133

The number of events is correct however.

This Post seems to have it set up with the 'overlap' I described, but I am unsure about the N, and they don't seem to be clustering by ID.

Best Answer

Your data formatting are correct.

You have multiple records per-patient due to recurrent events and the added complexity of the drug being a time varying covariate. The output you printed using head is helpful for understanding these data.

The typical approach to analyzing recurrent events as well as time varying covariates, is formatting the data to be in a "long" format where each row represents an interval of risk-covariate observations. For instance, we see patient 123 is on Drug1 alone from time 0 to time 2, then changes to take both Drug 1 and Drug 2 from time 3. At that point, they had not experienced a fall, so their observation from 0-2 is censored at that point because we do not know how much longer their fall would come if they continued to take Drug 1 alone. At time 3 they are re-entered into the cohort coded as a patient taking both drugs for 7 time-units after which they experience their first fall. They experience a second fall on the same Drug combination only 4 time-units after.

The number of records is not a useful summary of cohort data. It is not surprising the number of rows is far larger than the number of patients. Instead, sum the times from start-to-stop and record it as an amount of person-time-at-risk. The cohort-denominator is useful for understanding incidence. It is useful also to summarize the raw number of patients, but bear in mind the data are in "long" format so that is less than the number of rows in your dataset.

For the error, I think you may need to add 1 unit to the "stop" date. If patient 123 takes drug 1 for days 0, 1, and 2 and then starts drug 2 on day 3, then they experienced 3 days at-risk for falls on drug 1. However, 2-0 = 2 and that is not the correct denominator.

What the "cluster" argument does (typically) is impose a frailty, which is a type of random intercept that accounts for what may be proportional risk differences attributable to several unmeasured risk factors. I do not often conduct analyses with frailties. You can omit the "cluster" command and interpret the outcomes as incidence ratios. You can alternately fit the cox model for the time until the first fall in all patients and interpret the hazard ratios as risk ratios. I think the frailty result should fall somewhere between these two, and I've never quite been clear what the interpretation should be.

Related Question