Solved – R: How to fit a coxph model with 2 categorical predictor variables that violate assumptions? Coefficient or Covariate

assumptionscox-modelrregression coefficientssurvival

Stat Question: I only have two predictor covariates for my cox ph model, and they are both categorical. They both violate assumptions of proportional hazards. I am confused if I can call them time-dependent covariates (as they do not depend on time, they are completely independent of time, but the survival associated with them is definitely dependent on time). Adding the interaction terms of both variables and time does not "de-violate" my assumptions. I have thought about adding time-dependent coefficients by stratifying, but then I can not get an estimate either variable. Is there a better option? I've included my study description below, and attached my data.

Study Question : What is the risk of death associated with parasitism (group) and concentration of contaminant (treatment)?

Data Information : I have 6 concentrations of contaminant. (0, 0.1,0.2,0.4,0.8,1.6). I had 10 replicates per concentration. I performed 10 trials. (600 individual organisms). I checked for mortality every 12 hours for 96 hours. About 32% of the organisms were parasitized. As parasitism did not vary across treatment or trial, I can create a parasitized and unparasitized group. Group is binary (1=parasite, 2=no parasite). Treatment values are the same as the concentrations.

I want to create a cox ph model, using group(parasitized or unparasitized) and treatment (concentration). I eliminate one concentration (0) as there was no mortality.

> coxph <- coxph(Surv(Time, Event) ~ Treatment + Parasite, data=data)
> summary(coxph)
Call:
coxph(formula = Surv(Time, Event) ~ Treatment + Parasite, data = data)

  n= 500, number of events= 317 

             coef exp(coef) se(coef)      z Pr(>|z|)    
Treatment 1.03502   2.81516  0.09443 10.961  < 2e-16 ***
Parasite  0.51782   1.67836  0.11337  4.567 4.94e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

          exp(coef) exp(-coef) lower .95 upper .95
Treatment     2.815     0.3552     2.340     3.388
Parasite      1.678     0.5958     1.344     2.096

Concordance= 0.691  (se = 0.019 )
Rsquare= 0.227   (max possible= 0.999 )
Likelihood ratio test= 128.6  on 2 df,   p=0
Wald test            = 139.1  on 2 df,   p=0
Score (logrank) test = 151.4  on 2 df,   p=0

> cox.zph(coxph)
             rho chisq        p
Treatment  0.211 13.80 2.04e-04
Parasite  -0.124  4.75 2.93e-02
GLOBAL        NA 18.86 8.02e-05

UPDATE: I changed the set up of my data to a tstart,tstop form. The headings are id, group (binary, 1-parasite, 0-no parasite), treatment (factor with 5 levels), tstart (0,12,24,36,48,60,72,84), tstop (12,24,36,48,60,72,84), and event (0,1 – death). In all, 4,000 observations.

QUESTION: Is it appropriate to time transform a categorical variable?

> treatmentgrouptt <- coxph(Surv(tstart, tstop, event) ~ tt(group) + factor(treatment))
> summary(treatmentgrouptt)
Call:
coxph(formula = Surv(tstart, tstop, event) ~ tt(group) + factor(treatment))

  n= 4000, number of events= 317 

                        coef exp(coef) se(coef)     z Pr(>|z|)    
tt(group)            0.06008   1.06192  0.02934 2.048 0.040594 *  
factor(treatment)0.2 0.44173   1.55539  0.23816 1.855 0.063627 .  
factor(treatment)0.4 0.84763   2.33411  0.22281 3.804 0.000142 ***
factor(treatment)0.8 1.03544   2.81635  0.21713 4.769 1.85e-06 ***
factor(treatment)1.6 1.25081   3.49319  0.21158 5.912 3.39e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                     exp(coef) exp(-coef) lower .95 upper .95
tt(group)                1.062     0.9417    1.0026     1.125
factor(treatment)0.2     1.555     0.6429    0.9753     2.481
factor(treatment)0.4     2.334     0.4284    1.5082     3.612
factor(treatment)0.8     2.816     0.3551    1.8402     4.310
factor(treatment)1.6     3.493     0.2863    2.3074     5.288

Concordance= 0.624  (se = 0.049 )
Rsquare= 0.014   (max possible= 0.624 )
Likelihood ratio test= 57.9  on 5 df,   p=3.295e-11
Wald test            = 51.8  on 5 df,   p=5.922e-10
Score (logrank) test = 55.61  on 5 df,   p=9.763e-11

> zptreatmentgrouptt <- cox.zph(treatmentgrouptt)
> zptreatmentgrouptt
                          rho   chisq      p
tt(group)            -0.09423 2.84387 0.0917
factor(treatment)0.2  0.04960 0.77994 0.3772
factor(treatment)0.4 -0.05060 0.81221 0.3675
factor(treatment)0.8 -0.00448 0.00636 0.9364
factor(treatment)1.6 -0.00762 0.01851 0.8918
GLOBAL                     NA 7.38434 0.1936

Schoenfield Residuals

Best Answer

It's hard to know how the data was constructed without an example, so I can't test my suggestion. Anyways:

  1. Treatment does not seem categorical, and should not be treated as one. It is continuous for all purposes to the best of my understanding.

  2. Regarding the main question - You should try a step function, creating different coefficients for the same covariates, over different intervals as can be imputed from the Schoenfeld residual graphs above. To the point, try splitting at around time 50:

    model <- survSplit(Surv(time, event) ~ Treatment + Parasite, data = data, 
                         cut = 50, episode = "tgroup", id = "id")
    
    coxph <- coxph(Surv(time, event) ~ Parasite + Treatment:strata(tgroup), data = model)
    

This might work. For more information see Therneau's Using time dependent covariates and time dependent coefficients in the cox model

By the way, having the covariates failing the proportional hazard assumption means that they DO change with time, and the strata is what we use to try and 'fix' this. The interpretation of the new coefficients will be different for the different time periods, meaning that the effect on the hazard ratio is different for the period up to time 50 (or any other you find), and over time 50.

Hope this helps.