Logistic Regression – Why Logistic Regression Is Well-Calibrated and How to Ruin Its Calibration

calibrationlogisticregression

In the scikit learn documents on probability calibration they compare logistic regression with other methods and remark that random forest is less well calibrated than logistic regression.

Why is logistic regression well calibrated? How could one ruin the calibration of a logistic regression (not that one would ever want to – just as an exercise)?

Best Answer

Although this question and its first answer seems to be focused on theoretical issues of logistic regression model calibration, the issue of:

How could one ruin the calibration of a logistic regression...?

deserves some attention with respect to real-world applications, for future readers of this page. We shouldn't forget that the logistic regression model has to be well specified, and that this issue can be particularly troublesome for logistic regression.

First, if the log-odds of class membership is not linearly related to the predictors included in the model then it will not be well calibrated. Harrell's chapter 10 on Binary Logistic Regression devotes about 20 pages to "Assessment of Model Fit" so that one can take advantage of the "asymptotic unbiasedness of the maximum likelihood estimator," as @whuber put it, in practice.

Second, model specification is a particular issue in logistic regression, as it has an inherent omitted variable bias that can be surprising to those with a background in ordinary linear regression. As that page puts it:

Omitted variables will bias the coefficients on included variables even if the omitted variables are uncorrelated with the included variables.

That page also has a useful explanation of why this behavior is to be expected, with a theoretical explanation for related, analytically tractable, probit models. So unless you know that you have included all predictors related to class membership, you might run into dangers of misspecification and poor calibration in practice.

With respect to model specification, it's quite possible that tree-based methods like random forest, which do not assume linearity over an entire range of predictor values and inherently provide the possibility of finding and including interactions among predictors, will end up with a better-calibrated model in practice than a logistic regression model that does not take interaction terms or non-linearity sufficiently into account. With respect to omitted-variable bias, it's not clear to me whether any method for evaluating class-membership probabilities can deal with that issue adequately.