Solved – Difference in chi-squared calculated by anova from cph and coxph

anovachi-squared-testcox-modelrregression

Is there a difference between chi-squared (from coxph -> anova) and Wald chi-squared (from cph -> anova)?

And how do I have to interpret these chi-squared values? What does a P<0.05 mean in this case? Why does the sum of chi-squared values of each variable not equal TOTAL? My idea was that each chi-squared indicates the predictive information of each variable and TOTAL that of the entire model.

> library(survival)
> library(rms)
>
> data(colon)
> d <- colon
> m1 <- cph(Surv(time, status) ~ age + sex + nodes, data=d)
> anova(m1)
                Wald Statistics          Response: Surv(time, status) 

 Factor     Chi-Square d.f. P     
 age          0.03     1    0.8612
 sex          0.93     1    0.3349
 nodes      189.79     1    <.0001
 TOTAL      192.01     3    <.0001
> 0.03+0.93+189.79 # = 190.75
[1] 190.75
> m2 <- coxph(Surv(time, status) ~ age + sex + nodes, data=d)
> anova(m2)
Analysis of Deviance Table
 Cox model: response is Surv(time, status)
Terms added sequentially (first to last)

       loglik    Chisq Df Pr(>|Chi|)    
NULL  -6424.0                           
age   -6423.6   0.7147  1     0.3979    
sex   -6423.4   0.5019  1     0.4787    
nodes -6356.9 132.8685  1     <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Best Answer

Test differences

There are two differences between the two tests used:

  1. The use of likelihood ratio tests versus Wald tests
  2. The use of a sequential tests versus tests for the effect of one variable given the other variables

Since your example data set is huge (1822 complete observations, with 897 events) the first difference doesn’t matter much, so let’s first look at the second difference.

Sequential tests versus tests of one variable given the others

Note that the output from running anova() on the coxph model says Terms added sequentially (first to last). This means that for the first variable, age, we simply test if age is a statistically significant predictor without looking at any other variables. Basically, we test if the model including age fits the data better than a model with no explanatory variables (only an intercept), using a likelihood ratio test (which we can do, since the models are nested). This should give the same result as

anova(coxph(Surv(time, status) ~ age, data=d))

(The actual results differ slightly, because of missing data in the other explanatory variables. If you remove the observations with missing data, you will get the exact same answer.)

For the second variable, sex, we test if sex is statistically significant given age; we compare a model containing only age with one containing both age and sex.

For the third variable, nodes, we test if nodes is statistically significant given both age and sex; we compare a model containing both age and sex with one containing age, sex and nodes. This is the only test where we can compare the result to the one from anova(m1).

Getting tests of one variable given the others for coxph models

For getting test results from the coxph models comparable to the ones in the cph models in general, we have several options. One simple method is to use drop1() to compare the full model (three predictors) with ones containing all predictors except one, using likelihood ratio test. First, to avoid some problems with differing number of observations depending on which variables we include, we refit the models on the complete data:

d.comp = na.omit(d[c("time","status","age","sex","nodes")])
m2.comp = update(m2, data=d.comp)

No we drop each predictor in turn:

drop1(m2.comp, test="Chisq")

and get

       Df   AIC     LRT Pr(>Chi)    
<none>    12720                     
age     1 12718   0.031   0.8611    
sex     1 12719   0.929   0.3351    
nodes   1 12851 132.868   <2e-16 ***

As you see, the results are very similar to the ones from the Wald tests from cph.

Wald tests?

So what are the Wald tests? Basically, since all predictors are continuous, they’re just normal, asymptotic z-tests, but with squared test statistics. That is, each test statistic is the square of the $z$ statistic from summary(m2.comp) (and the $z$ statistic is the estimated coefficient divided by its standard error). Example:

summary(m2.comp)

            coef  exp(coef)   se(coef)      z Pr(>|z|)    
age    0.0004934  1.0004936  0.0028216  0.175    0.861    
sex   -0.0645554  0.9374842  0.0669405 -0.964    0.335    
nodes  0.0872323  1.0911501  0.0063330 13.774   <2e-16 ***

The $z$-statistic of sex is $-0.0645554/0.0669405=-0.964$, and $(-0.964)^2=0.93$, which is the chi-square statistic of the Wald test of the sex predictor from the cph model. (For factors and nonlinear variables, the calculations are slightly more complex, taking the correlation between the estimators of the (dummy/transformed) variables used to represent the factor / nonlinear effect into account.)

Which tests to use?

Both sequential tests and tests of one variable given the others makes sense, but they test different hypotheses. The former basically ask ‘if I add this new predictor, does it improve the fit?’ iteratively, for an ordered list of potential predictors. The latter asks ‘given that I include all other predictors, does adding this one improve the fit?’.

Wald tests versus likelihood ratio tests

The other difference between the two tests, i.e., difference 1 mentioned above, is the difference between asymptotic Wald tests (basically relying on the central limit theorem – that you have enough observations that test statistics are approximately normally distributed) and (partial-)likelihood ratio tests (LRTs). For small data sets, the results can differ somewhat. (And even here, the test statistic for the nodes variable is quite different.) Usually, likelihood ratio tests are preferred.

And if you want to compare the Wald and the LRT tests on the same model fitted using ‘coxph()’ (or other normal regression functions), it’s very easy to do using the car package:

library(car)
Anova(m2.comp, test.statistic="Wald") # Equal to anova(m1)
Anova(m2.comp, test.statistic="LR")   # Equal to drop1(m2.comp, test="Chisq")

which gives us:

# LR
      LR Chisq Df Pr(>Chisq)    
age        0.0  1       0.86    
sex        0.9  1       0.34    
nodes    132.9  1     <2e-16 ***

# Wald
            Df  Chisq Pr(>Chisq)    
age          1   0.03       0.86    
sex          1   0.93       0.33    
nodes        1 189.73     <2e-16 ***

Not surprisingly, the $p$-values are (for any practical use) identical.