Solved – Interaction of risk factor with age in survival model with age as time scale

ageinteractionsurvivaltime-varying-covariate

I'm currently looking into survival analysis regression models and can't quite wrap my head around the following:

Say I want to model time to occurrence of a cardiovascular event and use age (as opposed to follow-up time) as the underlying time scale. It seems clear to me that it doesn't make sense to include age as a covariate in the model, but what about (2-way) interaction terms between age and another risk factor, say smoking or blood pressure? Would it make sense to include such interaction terms as covariates (also in light of the usual advice to always include both of an interaction's constituent main effects, which in this case wouldn't seem to make much sense for the age main effect)?

Also, assuming it would be possible to include such interaction terms, should I treat age as a time-varying covariate (i.e., re-calculate the hazard in different years of the follow-up based on different values of age) or should I use the baseline age value only?

Thanks!

EDIT: As an example of the age * covariate interaction problem described above, consider the model described in this quote from the supplementary material of Hajifathalian et al. (2015) (excluding the references made in this quote):

We used sex- and cohort-stratified Cox proportional hazards regression to estimate the coefficients of the risk function, similar to previous cohort pooling projects. The risk predictors (smoking, diabetes, serum total cholesterol and systolic blood pressure) were measured at baseline. Participants’ age was used as the time scale, which allows age-specific cardiovascular disease (CVD) rates to vary across cohorts, and hence across populations to which the risk score is applied. This parameterization allows the risk equation to be recalibrated to each country, and each sex and age group within it, by replacing the age-sex-specific CVD rates from the pooled cohorts with those from the application country. We included interaction terms between all risk factors and age and, as described in the main paper, between sex and diabetes, and sex and smoking based on previous evidence.

The Cox model is given as:

$\lambda_i(t)=\lambda_{0,k}(t)exp(\sum^4_{l=1}\beta_lX_{i,l}+\sum^4_{l=1}\delta_ltX_{i,l}+\sum^4_{l=3}\gamma_lsex_iX_{i,l}$ (1)

where subscript $i$ denotes individual participants in the cohorts; $k$ denotes sex-cohort combination; $l$ denotes risk factors in the risk prediction equation (1: systolic blood pressure, 2: total cholesterol, 3: diabetes, and 4: smoking); $t$ denotes age in years; $X_{i,l}$ is the level of the risk factor $l$ for individual $i$ at baseline; $\beta_l$ is the log hazard ratio (HR) for the main effect of each risk factor, $\delta_l$ is the coefficient for linear interaction between risk factor $l$ and age, and $\gamma_l$ is the coefficient for interaction between diabetes and smoking, and sex; and $\lambda_{0,k}(t)$ is the age-specific hazard of CVD at the average level of risk factors for participants of sex-cohort $k$. $\lambda_{0,k}(t)$ is conceptually equivalent to an age-specific version of the $S_0(t)$ parameter in the specification used by D’Agostino and colleagues in the Framingham Risk Score. This formulation of the risk prediction equation does not need a coefficient for age although it includes interaction terms between age and other risk factors as described below.

Best Answer

Bendix Carstensen has a nice presentation in which he shows how this can be accomplished by splitting individual follow up time into small pieces and use a Poisson model for the rates thereby using the time scale (in your case age) as a covariate rather than part of the outcome. If you use R, the Epi package contains the tools you need.

Another option is to extend the Cox model to include time-dependent coefficients. See this vignette for the survival package in R. In that vignette it is also explained why it is not appropriate to include a covariate * age follow-up time interaction directly in the Cox model.

EDIT: Removed the bit about baseline age. That would of course not make sense with age as the time scale. Typed a bit too fast...

EDIT 2: Yes, the age * covariate interaction is an example of a time dependent coefficient and one of the ways for assessing the proportional hazards assumption.

As for the paper, it is possible that the critisism could apply. I can't find any info that the authors allowed for time-dependent coefficients. Including the age * covariate interaction is certainly possible, and R prints a warning message but estimates the coefficients. See the example below which is very inspired by the vignette mentioned above.

require(survival)
data(veteran)

## Use age as the time scale
## age * karno in the model
fit1 <- coxph(Surv(age, age + time, status == 1) ~ trt + celltype + karno + age:karno, data = veteran)
Warning message:
In coxph(Surv(age, age + time, status == 1) ~ trt + celltype + age *  :
  a variable appears on both the left and right sides of the formula

## karno as a time-dependent coefficient, simple interaction similar to fit1
fit2 <- coxph(Surv(age, age + time, status == 1) ~ trt + celltype + karno + tt(karno),
          tt = function(x, t, ...) x * t, data = veteran)

## Coefficients from fit1
> fit1
Call:
coxph(formula = Surv(age, age + time, status == 1) ~ trt + celltype + 
karno + age:karno, data = veteran)

                       coef exp(coef)  se(coef)     z       p
trt                2.59e-01  1.30e+00  2.05e-01  1.26  0.2072
celltypesmallcell  8.32e-01  2.30e+00  2.73e-01  3.05  0.0023
celltypeadeno      1.17e+00  3.21e+00  2.95e-01  3.95 7.8e-05
celltypelarge      3.95e-01  1.48e+00  2.83e-01  1.39  0.1633
karno             -3.33e-02  9.67e-01  1.07e-02 -3.10  0.0020
karno:age          4.75e-05  1.00e+00  1.77e-04  0.27  0.7890

Likelihood ratio test=62  on 6 df, p=1.75e-11
n= 137, number of events= 128

## Coefficients from fit2

> fit2
Call:
coxph(formula = Surv(age, age + time, status == 1) ~ trt + celltype + 
karno + tt(karno), data = veteran, tt = function(x, t, ...) x * 
t)

                       coef exp(coef)  se(coef)     z       p
trt                1.70e-01  1.19e+00  2.04e-01  0.84 0.40333
celltypesmallcell  9.08e-01  2.48e+00  2.75e-01  3.30 0.00097
celltypeadeno      1.22e+00  3.40e+00  3.00e-01  4.08 4.5e-05
celltypelarge      4.04e-01  1.50e+00  2.83e-01  1.42 0.15422
karno             -4.69e-02  9.54e-01  8.33e-03 -5.63 1.8e-08
tt(karno)          1.15e-04  1.00e+00  4.71e-05  2.45 0.01443

Likelihood ratio test=68.2  on 6 df, p=9.45e-13
n= 137, number of events= 128

Note the difference in the estimated coefficients. Reading the vignette again I see that section 4.2 deals with follow-up time * covariate interaction and not necessarily age * covariate interaction. But I think the example illustrates that results can be misleading when not properly allowing for time-dependent coefficients.

Hope this helps!