Classification ROC – Understanding the Origin of the Gini Coefficient

classificationginiroc

I understand what a ROC curve is. However, I do not understand the Gini coefficient in the context of binary classification.

All the resources I have checked state that $Gini = 1 – (2 \times AUC_{ROC})$.

How is this equality derived? As an economist, it troubles me to think of a Gini coefficient in this context.

Best Answer

It's complicated. It appears that the Gini coefficient based on the ROC curve was invented by analogy with the economic Gini coefficient based on the Lorenz curve. It's not a perfect analogy – e.g., the Lorenz curve is necessarily convex and the ROC curve is only necessarily monotone. They do happen to be the same in some situations

Also, if you have fitted probabilities $\hat p_i$ for each individual, and you have a well-calibrated model (ie, $\hat p_i$ really does estimate $P[Y=1|X=x_i]$), there is some relationship between the two in concept. If you have a well-calibrated model, you can summarise how well it predicts by considering the variability in the predictions – since, ex hypothesi, different predictions for different people reflects genuine discriminatory power. So, you can look at the $\hat p$ and ask how unequal they are, with more inequality implying better discrimination.

However, the simple relationship between AUC and the economic Gini coefficient doesn't hold in general.

Related Question