Statistical Significance – ANOVA Model Comparison vs. Concordance Index for Survival Prediction

cox-modelmethod-comparisonmodelingstatistical significancesurvival

I have developed a cox-model in my research, and I want to compare it to the conventional staging system for prognostic evaluation. The traditional staging system categorizes patients into risk groups with different prognoses. To compare my new model with the conventional model, I have categorized my model prediction into risk groups with the same size (n) as the traditional model. The Kaplan Meier analysis of the traditional and new models shows that both are significant predictors of prognostic outcome ( Log-rank test p<.0.001), but the new model is better for placing patients with poor prognoses at the high-risk groups. My question is: What is the best statistical approach to compare these models?

My solutions so far have been to 1) fit the model group/predictor in a cox-model and compare them with an ANOVA-test (this yields a p-value < 0.001.

  1. perform a comparison of the concordance indexes for the two models predictors: compareC(time,event, new.model.predictor,conventional.model.predictor), concordance index:0.86 vs 0.75 with p-value = 0.02.

I guess approach nr.1 compares the model fit, while approach 2 compares the discriminatory abilities of the two models.

Inputs on which approach would be the most informative and appropriate would be greatly appreciated.

Best Answer

To start, don't destroy the quality of your new model by arbitrarily categorizing its survival predictions. Keep it continuous.

For your ANOVA test, you need to have the models nested. A good way to proceed would be to add the conventional staging as a predictor to your model and then doing ANOVA to compare (your model + conventional staging) against conventional staging. You have to be careful as an anova() function doesn't always do the same test from package to package. You want to do a likelihood-ratio test comparison.

Although useful for evaluating the discrimination capacity of a single model, even Frank Harrell doesn't think that the C-index provides a sensitive test for comparing models. This blog post discusses ways to compare model performance for logistic and survival regression. In particular, the Adequacy Index, calculated from likelihood ratios, provides a measure of "the adequacy of the model that ignores the new predictors."