Solved – Lorenz curve and Gini coefficient for measuring classifier performance

aucclassificationginilorenz-curveroc

I often use a ROC curve and the area under that curve as a measure of classifier accuracy in 2-class problems, e.g:

#Load a dataset
library(mlbench)
data(Sonar)

#Build a model
library(caret)
model <- train(Class~., data=Sonar, method='gbm', tuneLength=1, trControl=trainControl(method='cv'))
model

#ROC curve and AUC
library(pROC)
pMal <- predict(model, newdata=Sonar, type='prob')[,2]
roc(Sonar$Class, pMal, plot=TRUE)
>Area under the curve: 0.9705

#Lorez curve and gini?

enter image description here

In a similar manner, I would like to be able to plot the lorenz curve and calculate the gini coefficient for my classifier. I know Gini = 2*AUC-1, but I'm not actually sure how to calculate it on it's own. Furthermore, every application of a lorenz curve I've seen looks at univariate data (e.g. income distribution). How do I calculate a lorenz curve when I have 2 parameters: the predicted probability of the positive class, and the positive class itself?

Best Answer

Lorenz curve is also known under the name of "lift curve" when applied to classification/ranking. For a given range of predicted probability values, the lift represents a multiplicative increase in the positive class's rate (due to a given predictive model) over a random guess.

rocr package can calculate lift values/curves (the manual also has a concise definition of the lift). The Gini index can be calculated from the area under the lift curve (I typically use cumulative lift value at a given predicted probability threshold instead since it is easier to relate to business metrics)