Solved – Interpreting output of Cox regression model

cox-modelginiinterpretationregressionsurvival

I am trying to use two variables – activity score (ascore – a whole number indicating amount of activity) and gini (given by Gini-Simpson index – a value ranging between 0 and 1, indicating diversity of activity) to predict number of days they survived.

Call:
coxph(formula = Surv(NumDays, failed) ~ ascore + gini, data = records1)

  n= 47966, number of events= 39853 

             coef  exp(coef)   se(coef)      z Pr(>|z|)    
ascore -2.801e-03  9.972e-01  5.634e-05 -49.72   <2e-16 ***
gini   -2.535e-01  7.761e-01  2.229e-02 -11.38   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

   exp(coef) exp(-coef) lower .95 upper .95
ascore    0.9972      1.003    0.9971    0.9973
gini      0.7761      1.289    0.7429    0.8107

Concordance= 0.648  (se = 0.003 )
Rsquare= 0.068   (max possible= 1 )
Likelihood ratio test= 3358  on 2 df,   p=0
Wald test            = 2702  on 2 df,   p=0
Score (logrank) test = 2705  on 2 df,   p=0

If my understanding of the above model is right, what this is saying is holding the other covariate constant, ​one​ additional activity ​reduces the hazard of animal dying by a factor of exp(ascore) = 0.9972 on an average that is by 0.28 percent while a unit of diversity(gini) causes a reduction in the hazard of animal dying by a factor of exp(ead) = 0.7761 on an average that is by 25.35 percent.​

Now a unit of activity is easily understandable to be 1 additional activity. But what does unit of diversity mean when I take gini which ranges from 0 to 1? So, what does a one unit rise in Gini-Simpson indicate? Also, how do I define what the original hazard is based on my model?

Best Answer

Your problem is not actually an interpretation of the output of the Cox model, which you are doing correctly, but what a single digit rise in the Gini-Simpson index means. Keep in mind that regression models are "dumb" - they do not make any of their calculations with the context of the variables in mind - that's your job.

Technically, a 0 to 1 jump would be the change in hazard from moving from the lowest value of the index to the highest, which may or may not be meaningful to you.

There are other ways to solve your problem:

  • As Dimitriy mentions in the comments, you can rescale the Gini-Simpson index to be [0,100], [0,10] or some other value so that a single digit increase or decrease is more meaningful, and rerun your analysis.
  • Similarly, if there are logical categorical breaks in the distribution of the variable, you can make the variable categorical. This has been my solution in several ecology problems where technically I had a continuous variable, but my actual data had things falling into one of two values.
  • The "one-step" interpretation is just to make the math easy by allowing you to exponentiate just the regression coefficient. You could multiply it by 10 to get the hazard ratio of a 0 to 10 jump, or multiple it by 0.10 to get the hazard ratio of a 0 to 0.10 jump.
Related Question