Cox Regression Analysis – Determining Relative Importance of Variables

cox-modelrregressionregression-strategiessurvival

I've understood that relative importance of predictors is a tricky question. Suggested methods range from very complex models to very simple variable transformations. I've understood that the brightest still debate which way to go on this matter. I'm looking for an easy but still appealing method to approach this in survival analysis (Cox regression).

My aim is to answer the question: which predictor is the most important one (in terms of predicting the outcome). The reason is simple: clinicians want to know which risk factor to adress first. I understand that "important" in clinical setting is not equal to "important" in the regression-world, but there is a link.

Should I compute the proportion of explainable log-likelihood that is explained by each variable (see Frank Harrell post), by using:

library(survival); library(rms)
data(lung)
S <- Surv(lung$time, lung$status)
f <- cph(S ~ rcs(age,4) + sex, x=TRUE, y=TRUE, data=lung)
plot(anova(f), what='proportion chisq')

As I understand it, its only possible to use the 'proportion chisq' for Cox models and this should suffice to convey some sense of each variables relative importance. Or should I perhaps use the default plot(anova()), which displays Wald χ2 statistic minus its degrees of freedom for assessing the partial effect of each variable?

I would appreciate some guidance if anyone has any experience on this matter.

Best Answer

Thanks for trying those functions. I believe that both metrics you mentioned are excellent in this context. This is useful for any model that gives rise to Wald statistics (which is virtually all models) although likelihood ratio $\chi^2$ statistics would be even better (but more tedious to compute).

You can use the bootstrap to get confidence intervals for the ranks of variables computed these ways. For the example code type ?anova.rms.

All this is related to the "adequacy index". Two papers using the approach that have appeared in the medical literature are http://www.citeulike.org/user/harrelfe/article/13265566 and http://www.citeulike.org/user/harrelfe/article/13263849 .

Related Solutions

Cox Proportional Hazard Model – Insights on Time Varying Coefficients in Cox Proportional Hazard Model

Since the seasonal dummy variables are static by nature, and their coefficients clearly vary with the time variable, how much does it matter?

The value that you get is a form of average over time. Unfortunately as you naturally have more cases in time-to-event analyses early on, you can't simply say that the effect is balanced throughout time. From my experience the initial period has a much heavier impact on the estimate than the later and it

I get that statistically, it means something, but does the violation of PH assumption invalidate the (intuitively appealing) result that non-response is more likely to happen in the summer and winter?

Again, it probably doesn't but you can't be sure until you've checked. Something is definitely happening over time and you should at least have a look at the residual plot (plot(cox.zph(...))). It isn't entirely surprising that you have a problem with the PH since seasons are part of the time variable and there will be situations where early summer and late spring are similar.

If so, is there a way to handle this so that the PH assumption is not violated? I know about using the tt transform, but I can't seem to figure out the exact form for the function.

The tt transform is tricky to use with big data. It explodes the matrix and can get a little messy, e.g. if you modify the lung example in the survival package:

library(survival)
coxph(Surv(time, status) ~ ph.ecog + tt(age), data=lung,
      tt=function(x,t,...) {
        print(length(x))
        pspline(x + t/365.25)
      })

It prints 15809 while there are only 228 rows in the original dataset. The principle of the tt() is that it feeds the variables into the transformation function where you are free to use time any way you wish. Note that you can also have different transformation functions depending for each variable:

library(survival)
coxph(Surv(time, status) ~ tt(ph.ecog) + tt(age), data=lung,
      tt=list(
        function(x,t,...) {
          cbind(x, x + t/365.25, (x + t/365.25)^2)
        },
        function(x,t,...) {
          pspline(x + t/365.25)
        }),
      x=T,
      y=T) -> fit
head(fit$x)

Gives:

    tt(ph.ecog)x tt(ph.ecog) tt(ph.ecog) tt(age)1 tt(age)2 tt(age)3 tt(age)4
6              1         3.4        11.7        0        0        0    0.000
3              0         2.4         5.8        0        0        0    0.020
38             1         3.4        11.7        0        0        0    0.000
5              0         2.4         5.8        0        0        0    0.000
6.1            1         3.2        10.4        0        0        0    0.000
3.1            0         2.2         5.0        0        0        0    0.026
    tt(age)5 tt(age)6 tt(age)7 tt(age)8 tt(age)9 tt(age)10 tt(age)11 tt(age)12
6       0.00  0.00000    0.000   0.0052    0.359      0.58     0.053         0
3       0.48  0.48232    0.021   0.0000    0.000      0.00     0.000         0
38      0.00  0.00087    0.266   0.6393    0.094      0.00     0.000         0
5       0.03  0.51933    0.437   0.0136    0.000      0.00     0.000         0
6.1     0.00  0.00000    0.000   0.0078    0.388      0.56     0.044         0
3.1     0.50  0.45457    0.016   0.0000    0.000      0.00     0.000         0

I therefore try to avoid this solution and use the time-split approach that I wrote about here and in my answer to my own question.

Solved – compare non-nested Cox models

You can compare Cox regression models (coxph) in R with plrtest which is partial likelihood ratio test for non-nested coxph models:

require("survival")
require("nonnestcox") #github.com/thomashielscher/nonnestcox
pbc  <- subset(pbc, !is.na(trt))
mod1 <- coxph(Surv(time, status==2) ~ age, data=pbc, x=T)
mod2 <- coxph(Surv(time, status==2) ~ age + albumin + bili + edema + protime, data=pbc,  x=T)
mod3 <- coxph(Surv(time, status==2) ~ age + log(albumin) + log(bili) + edema + 
log(protime), data=pbc, x=T)
plrtest(mod3, mod2, nested=F) # non-nested models
plrtest(mod3, mod1, nested=T) # nested models

Best Answer

Related Solutions

Cox Proportional Hazard Model – Insights on Time Varying Coefficients in Cox Proportional Hazard Model

Solved – compare non-nested Cox models

Related Question