Solved – ROC curves for unbalanced datasets

classificationrocunbalanced-classes

Consider an input matrix $X$ and a binary output $y$.

A common way to measure the performance of a classifier is to use ROC curves.

In a ROC plot the diagonal is the result that would be obtained from a random classifier. In case of an unbalanced output $y$ the performance of a random classifier can be improved choosing $0$ or $1$ with different probabilities.

How can the performance of such classifier be represented in a ROC curve plot?
I suppose it should be a straight line with a different angle, and not the diagonal anymore?

ROC curve example

Best Answer

ROC curves are insensitive to class balance. The straight line you obtain for a random classifier now is already the result of using different probabilities of yielding positive (0 brings you to (0, 0) and 1 brings you to (1, 1) with any range inbetween).

Nothing changes in an imbalanced setting.