Logistic Regression – How to Compare Predictive Ability of Models

logisticmodel-evaluationpredictive-modelsregression-strategiesscoring-rules

I know some well-known measures are $c$ statistic, Kolmogorov-Smirnov $D$ statistic. However, as far as I know, those statistics take into account only of the rank order of the observations, and is invariant under changing the intercept of the logistic regression model (e.g. in oversampling-correction exercise).

In my current application, I need to depend on the accuracy of the logistic regression to predict probability of event. I know only of qualitative way of assessing models for probability prediction ability, namely by plotting "QQ-plot" of the actual vs predicted probability of event:

  1. Score the validation dataset using the developed model.
  2. Rank the observations according to the predicted probability and group into $n$ buckets according to their rank of predicted probability. (First 1/n would go to the first bucket, next 1/n would go to the next …)
  3. Calculate the average predicted and actual probability of Event for each bucket.
  4. Create a scatter plot of Predicted vs Actual – one point for each bucket.

I am wondering:

  1. Is the "Q-Q plot" I mentioned above a legitimate way to assess predictive performance of models developed from logistic regression? If so, where may I find more reference for that?
  2. Is there any known quantitative way to assess the probability prediction ability of this kind of model?

Best Answer

There are many good ways to do it. Here are some examples. These methods are implemented in the R rms package (functions val.prob, calibrate, validate):

  1. loess nonparametric full-resolution calibration curve (no binning)
  2. Spiegelhalter's test
  3. Brier score (a proper accuracy score - quadratic score)
  4. Generalized $R^2$ (a proper accuracy score related to deviance)
  5. Calibration slope and intercept

For comparing two models with regard to discrimination, the likelihood ratio $\chi^2$ test is the gold standard.

Four of the above approaches, and other approaches, are covered in the 2nd edition of my book Regression Modeling Strategies (coming in 2015-09) and in my course notes that go along with the book, available from the handouts link at https://biostat.app.vumc.org/wiki/Main/RmS .

The Brier score can be decomposed into discrimination and calibration components. Along with the Brier score and Spiegelhalter's test, the nonparametric calibration curve can detect errors in the intercept.

Related Question