Is there a difference between chi-squared (from coxph
-> anova
) and Wald chi-squared (from cph
-> anova
)?
And how do I have to interpret these chi-squared values? What does a P<0.05 mean in this case? Why does the sum of chi-squared values of each variable not equal TOTAL? My idea was that each chi-squared indicates the predictive information of each variable and TOTAL that of the entire model.
> library(survival)
> library(rms)
>
> data(colon)
> d <- colon
> m1 <- cph(Surv(time, status) ~ age + sex + nodes, data=d)
> anova(m1)
Wald Statistics Response: Surv(time, status)
Factor Chi-Square d.f. P
age 0.03 1 0.8612
sex 0.93 1 0.3349
nodes 189.79 1 <.0001
TOTAL 192.01 3 <.0001
> 0.03+0.93+189.79 # = 190.75
[1] 190.75
> m2 <- coxph(Surv(time, status) ~ age + sex + nodes, data=d)
> anova(m2)
Analysis of Deviance Table
Cox model: response is Surv(time, status)
Terms added sequentially (first to last)
loglik Chisq Df Pr(>|Chi|)
NULL -6424.0
age -6423.6 0.7147 1 0.3979
sex -6423.4 0.5019 1 0.4787
nodes -6356.9 132.8685 1 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Best Answer
Test differences
There are two differences between the two tests used:
Since your example data set is huge (1822 complete observations, with 897 events) the first difference doesn’t matter much, so let’s first look at the second difference.
Sequential tests versus tests of one variable given the others
Note that the output from running
anova()
on thecoxph
model saysTerms added sequentially (first to last)
. This means that for the first variable,age
, we simply test ifage
is a statistically significant predictor without looking at any other variables. Basically, we test if the model includingage
fits the data better than a model with no explanatory variables (only an intercept), using a likelihood ratio test (which we can do, since the models are nested). This should give the same result as(The actual results differ slightly, because of missing data in the other explanatory variables. If you remove the observations with missing data, you will get the exact same answer.)
For the second variable,
sex
, we test ifsex
is statistically significant givenage
; we compare a model containing onlyage
with one containing bothage
andsex
.For the third variable,
nodes
, we test ifnodes
is statistically significant given bothage
andsex
; we compare a model containing bothage
andsex
with one containingage
,sex
andnodes
. This is the only test where we can compare the result to the one fromanova(m1)
.Getting tests of one variable given the others for
coxph
modelsFor getting test results from the
coxph
models comparable to the ones in thecph
models in general, we have several options. One simple method is to usedrop1()
to compare the full model (three predictors) with ones containing all predictors except one, using likelihood ratio test. First, to avoid some problems with differing number of observations depending on which variables we include, we refit the models on the complete data:No we drop each predictor in turn:
and get
As you see, the results are very similar to the ones from the Wald tests from
cph
.Wald tests?
So what are the Wald tests? Basically, since all predictors are continuous, they’re just normal, asymptotic z-tests, but with squared test statistics. That is, each test statistic is the square of the $z$ statistic from
summary(m2.comp)
(and the $z$ statistic is the estimated coefficient divided by its standard error). Example:The $z$-statistic of
sex
is $-0.0645554/0.0669405=-0.964$, and $(-0.964)^2=0.93$, which is the chi-square statistic of the Wald test of thesex
predictor from thecph
model. (For factors and nonlinear variables, the calculations are slightly more complex, taking the correlation between the estimators of the (dummy/transformed) variables used to represent the factor / nonlinear effect into account.)Which tests to use?
Both sequential tests and tests of one variable given the others makes sense, but they test different hypotheses. The former basically ask ‘if I add this new predictor, does it improve the fit?’ iteratively, for an ordered list of potential predictors. The latter asks ‘given that I include all other predictors, does adding this one improve the fit?’.
Wald tests versus likelihood ratio tests
The other difference between the two tests, i.e., difference 1 mentioned above, is the difference between asymptotic Wald tests (basically relying on the central limit theorem – that you have enough observations that test statistics are approximately normally distributed) and (partial-)likelihood ratio tests (LRTs). For small data sets, the results can differ somewhat. (And even here, the test statistic for the
nodes
variable is quite different.) Usually, likelihood ratio tests are preferred.And if you want to compare the Wald and the LRT tests on the same model fitted using ‘coxph()’ (or other normal regression functions), it’s very easy to do using the
car
package:which gives us:
Not surprisingly, the $p$-values are (for any practical use) identical.