Logistic Regression – Detailed Walkthrough of `rms::calibrate` Function

calibrationlogisticrregressionregression-strategies

The calibrate function in the rms R package allows us to compare the probability values predicted by a logistic regression model to the true probability values. This is easy enough: just plot them and make sure they are about the line $y=x$. However, if we knew the true probability values, there would not be any need to do statistical inference! We have to estimate what the true values are, which is a major part of what rms::calibrate does and what confuses me.

I have figured out a bit of how rms::calibrate works.

  1. It estimates the true probability values using the observed $0/1$ labels. This makes sense. If we have a lot of $1$s, we should have a lot of high values of $P(Y=1)$.

  2. It takes bootstrap samples of the predicted probability values paired with the observed outcomes.

  3. It fits some kind of model with the bootstrap samples of predicted probability values paired with observed outcomes.

Step $3$ is where I start to get lost. What kind of model is fit? Is it, indeed, fit to the bootstrap samples of predicted probability values and observed outomes, or does it fit a new logistic regression each time? (I don't think this is how it work, but I want to make sure.) How does it go from multiple models of bootstrap data to the "apparent" and "bias-corrected" values displayed on the graph?

EXAMPLE

library(rms)
set.seed(2022)
N <- 1000
x <- runif(N, -3, 3)
z <- x
pr <- 1/(1 + exp(-z))
y <- rbinom(N, 1, pr)
plot(x, y)
L <- rms::lrm(y ~ x, x = T, y = T)
cal <- rms::calibrate(L)
plot(cal)

enter image description here

Best Answer

Both calibrate() and validate() in rms by default use the optimism bootstrap, explained for example in Chapter 7 of Elements of Statistical Learning. In your example, you do repeat the logistic-regression modeling on each of multiple bootstrapped data samples.

In general with an optimism bootstrap, you develop a new model on a bootstrap sample. You find how much better that model performs (by some measure) on its bootstrap sample than it does on the full data set. That's the "optimism bias" of that model for that measure.

By the bootstrap principle, an average "optimism" over multiple bootstrap-based models estimates the optimism of the original full model when applied to the population from which the original data were sampled. You then can correct the original model by that average optimism.

For calibrate(), the "optimism" is in the calibration error as a function of predicted probability values. You first calculate the calibration error of the full model over a range of predicted probability values. You then estimate the "optimism" in those calibration error estimates by the optimism bootstrap, and subtract the averaged "optimism" from the original calibration of the full model.

The plot then shows smoothed curves for the original calibration and the optimism-corrected calibration. In the plot you show, the original model itself ("Apparent") deviates some from the "Ideal" calibration, but there doesn't seem to be much optimism (difference between "Apparent" and "Bias-corrected") in the calibration.

You can see details in the code; for logistic regression (lrm objects) type rms:::calibrate.default at the R command prompt. That includes a cal.error() internal function for evaluating calibration, and a call to the predab.resample() function for the optimism bootstrap.

Related Question