Solved – Relationship between pseudo-$R^2$ and area under the ROC curve

predictive-modelsregressionroc

The $R^2$ of a model measures how well a model fits the data and is a measure of the shared variation between two (or more) variables. Its equivalent measure for logistic regression is the pseudo-$R^2$. A pseudo-$R^2$ is sometimes presented alongside the area under the receiver operator characteristic (ROC) as a measure of a model's predictive accuracy.

I'm curious as to whether there is any straightforward relationship between these two metrics. Does a model with a higher pseudo-$R^2$ necessarily have a larger AUC ROC? Are there any situations where a model can have a low pseudo-$R^2$ but a high AUC ROC? It seems intuitive that the two measures are necessarily correlated, but I've been wrong many times in the past.

Best Answer

The AUC is scale independant. It is solely based on ranks. If you multiply all the probabilities outputed by your logistic regression by the same factor $\lambda\in(0,1]$, the AUC will remain the same. Note that as $\lambda\rightarrow0$ the pseudo $R^2$ will decrease (possibly becoming negative).

So you can have a low pseudo $R^2$ but a large AUC.

Related Question