Solved – How to distinguish overfitting and underfitting from the ROC AUC curve

aucoverfittingroc

For model selection, one of the metric is (AUC Area Under Curve) which tell us how the models are performing and based on AUC value we can choose the best model.

But how to distinguish whether a model is overfitting or underfitting from the AUC curve or AUC value of Training, test and desired AUC values?

Best Answer

Alone, the ROC curve (or AUC) of the final model on the training set will not provide any information (unless you know something about the performance of the optimal classifier). By definition, the training set cannot be used to evaluate overfitting/underfitting, as it cannot measure the generalization performance of the model. However, comparing the ROC curves of the training set and the validation set can help. The size of the gap between the training and validation metrics is an indicator of overfitting when the gap is large, and indicates underfitting when there is no gap. Everything in between is subject to interpretation, but a good model should produce a small gap.

Measuring the gap between the training and validation ROC curves should be done by measuring the area between the curves. Keep in mind that the difference between the AUCs does not compute the same quantities.

Monitoring the ROC curves (and the gap) during the learning phase can bring additional information as you can see the gap size progression.