Solved – Time-dependent variable in survival analysis using R

cox-modelrsurvivaltime-varying-covariate

I am conducting a retrospective study where I have a cohort of cases who underwent the same surgical procedure. The primary outcome of the study is the recurrence incidence rate during a follow up period of up to seven years. The risk of recurrence is known to be highest during the first year and then decrease over time.

I am investigating how a specific event during follow-up influence the risk of recurrence. I have identified 237 cases with the specific event (P) (group A), and matched this group 1:3 (based on other known risk factors) with cases without the specific event (group B).

Overall recurrence rate:

Group A: 43/237 = 18.1%
Group B: 78/711 = 11.0%

Thus, P seems to effect the recurrence rate. However, in group A, 19 of the recurrences actually happened prior to P, and thus these 19 recurrences can't be contributed to the effect of P.

Therefore, I fitted an Extended Cox PH regression using the survival-package in R as follows:

data <- read.csv2(file="Dataset.csv", header=T, sep=";", dec=",")
sdata <- tmerge(data, data, id=1:nrow(data),death = event(ftime, Recurrence), P = tdc(Ptime))
ftime = total days of follow up, Recurrence = 0/1, P = the specific event (0/1), and Ptime = days from start to P (NA if P=0).

Call: coxph(formula = Surv(tstart, tstop, death) ~ P, data = sdata)

        coef exp(coef) se(coef)    z     p
P -0.552 0.576 0.23 -2.4 0.017

Likelihood ratio test=6.35 on 1 df, p=0.0118 n= 1165, number of events= 121

This model reports that those who have experienced P are less likely to have a recurrence. However, this must be due to P occurring after a median of ~ 1 year after start, and the model thus simply reports the reduced risk of a recurrence, if you have not failed until then.

Is it possible to fit a model that take this into account?

Best Answer

If I understand you correctly, the 237 cases plus 711 controls are the total sample that you use for the Cox regression? In that case, I think you might need to use a frailty model to take into account that each case and its controls might share a common susceptibility for the recurrence event. But even so, I'm not sure that using matched data is better than just using the whole data set as it is, and include age, sex, and other known risk factors as covariates (and then you don't need to use a frailty model). If you want to use a frailty model I think you can simply add " + frailty(matching_group)" to the formula.

It seems to me that your model indicates that when the event P happens it seems to be associated to a lower hazard rate. This cannot be explained only by the fact that P happens after ~1 year, because for each recurrence event, the individual will be compared with other individuals at the same time point so it seems "true" (based on the limitations of the matching procedure) that P is in fact associated with lower risk for recurrence. Perhaps the problem is that you haven't taken the matching into account as described above.

Try a frailty model and using the entire dataset from which the cases and controls are drawn and see what results you get. I'll be happy to discuss this further.

EDIT: When thinking about it again, I don't think you should match the individuals in this way at all. Somebody may correct me if I'm wrong, but you may introduce biases that are difficult to interpret and correct for in the analysis by this procedure.

Related Question