Solved – the effect of training a model on an imbalanced dataset & using it on a balanced dataset

aucprecision-recallunbalanced-classes

When evaluating a model, for example a binary classifier, should the train and test set have 50% + and 50% – label distribution or could the distribution be random?

If the distribution is biased in the train/test sets e.g., 80% + and 20%-, the precision/recall scores may not be representative. For example, the model may do well on classifying positive points but may misclassify a lot of negative points. It's recall is high but its precision could still be high because there aren't too many false positives because there are less negative points in the dataset.

Is AUC robust metric against such imbalanced distributions? Or is it best to balance the distribution in train/test data in order to compute more accurate precision and recall values?

I read this Kaggle forum post: Precision-recall AUC vs ROC AUC for class imbalance problems,
but it doesn't discuss the issue I'm raising about dataset distribution.

Best Answer

It is important to understand that improper accuracy scoring rules lead to bogus models and are affected by imbalance (the AUROC (c-index) being one improper scoring rule that is an exception; the c-index is independent of outcome prevalence but is not sensitive enough to be used as an optimality criterion). Use of full probability estimators and proper scoring rules are recipes for success, and handle extreme imbalance. This issues are expanded upon here and here.