Logistic – Understanding Training AUC Below 0.5

I've trained a logistic regression using a small number of predictors – pseudo R-squared is only 0.1 but I have significant terms and a nice low p value for the model. However, even on its own training data, the AUC for the model is only 0.28:

I thought this was impossible, and my only intuition for what's going on is that the class imbalance (only 5% of the observations are in the positive class) means that the no information rate is pretty high. I still think my model should beat random guessing, at least on its own training data.

I assumed this was a coding error but I think I've ruled that out now, so can anyone explain to me how this is possible?!

I've looked for other discussions on here, most seem to focus on AUC < 0.5 on the test set which I can can more easily understand (e.g. here and here). This one came the closest, demonstrating that a predictor that's really just noise can come out below 0.5 – but I think my model has found some real relationships in the data…

Best Answer

AUC-ROC can be below 0.5 but when it is substantially below 0.5 as in the case shown here (~0.28) there is a good chance that the labels are flipped/reversed at some point in our modelling pipeline. Such a low AUC-ROC score would suggest we are consistently bad, it can happen, but usually we are just bad!

Best Answer

Related Solutions

Solved – pattern of ROC curve and choice of AUC

Solved – Training AUC and CV AUC in Boosted Regression Tree

Related Question