For binary classification problems where the data is highly imbalanced, i.e. much more negative samples than positive samples, it is recommended to evaluate the performance of a classifier using the ROC curve because it does not depend on the actual ratio between positive and negative class (see e.g. He et al). Yet, I recently came across an article that stated that rare events are usually "better predicted" when looking at the ROC curve. Unfortunately, I did not save the article and was unable to find it again so far.
Therefore, I decided to ask here if someone could point me to a paper that demonstrates this behaviour or can give an explanation where this comes from. My follow-up questions would then be, what the preferred way is to evaluate a classifier under these circumstances.
Best Answer
Let us try it out. Generate positively correlated quantitative classifier variable and binary state variable (0="negative", 1="positive"). And supply 3 weighting variables. Weight1 makes distribution 0/1 = 45/45. Weight2 makes it 15/75 (i.e. positive event is frequent). Weight3 makes it 75/15 (i.e. positive event is rare).
Weight the data with the weight variables one by one and perform ROC (I did it in SPSS). Below are statistics for Area under the curve.
You may notice that Area is the same, be the positive event rare, frequent or in-between. However, Error of the Area and other statistics around it are affected by whether the positive event is rare, frequent or in-between. The shape of curve itself (shown below) is not affected. So, background "rareness" of positive event has no impact on the choice of optimal classification cut-point in the classifier variable.