Solved – When using counting process in Cox PH survival analysis in R (survival::coxph), must I use cluster term in the model formula

correlationcox-modelrsurvivaltime-varying-covariate

I am running Cox Proportional Hazard Model in R, package survival, function coxph(). As I have time-varying covariates, my data is defined as counting process, that is there is one separate data record for each (t1,t2] time interval. So any object i can have multiple records, each for different time interval.

Does this mean that I must run the model with correlation structure, i.e. do I have to use cluster(Id) term in the model formula ? I have not seen used cluster(Id) term in some examples that I found, nevertheless I am in doubt, because how otherwise would you define that multiple observations belong to one object only ? I guess if multiple observations belong to one object, we should inform the model that the errors are not independent ?

Best Answer

The short answer is that no, you do not have to. The explanation is in the documentation of the survival package. See vignette("timedep", package = "survival").

The main reasoning is that an individual is at risk only in disjoint time intervals. Since the Cox partial likelihood is a sum over event time points, at each of those time points an individual may contribute with at most one line from the data.

Related Question