Solved – Lorenz curve and Gini coefficient for measuring classifier performance

aucclassificationginilorenz-curveroc

I often use a ROC curve and the area under that curve as a measure of classifier accuracy in 2-class problems, e.g:

#Load a dataset
library(mlbench)
data(Sonar)

#Build a model
library(caret)
model <- train(Class~., data=Sonar, method='gbm', tuneLength=1, trControl=trainControl(method='cv'))
model

#ROC curve and AUC
library(pROC)
pMal <- predict(model, newdata=Sonar, type='prob')[,2]
roc(Sonar$Class, pMal, plot=TRUE)
>Area under the curve: 0.9705

#Lorez curve and gini?

enter image description here

In a similar manner, I would like to be able to plot the lorenz curve and calculate the gini coefficient for my classifier. I know Gini = 2*AUC-1, but I'm not actually sure how to calculate it on it's own. Furthermore, every application of a lorenz curve I've seen looks at univariate data (e.g. income distribution). How do I calculate a lorenz curve when I have 2 parameters: the predicted probability of the positive class, and the positive class itself?

Best Answer

Lorenz curve is also known under the name of "lift curve" when applied to classification/ranking. For a given range of predicted probability values, the lift represents a multiplicative increase in the positive class's rate (due to a given predictive model) over a random guess.

rocr package can calculate lift values/curves (the manual also has a concise definition of the lift). The Gini index can be calculated from the area under the lift curve (I typically use cumulative lift value at a given predicted probability threshold instead since it is easier to relate to business metrics)

Related Solutions

Solved – Recall and AUC of a binary classifier

ROC curves are false negative rate vs true positive rate graph. If you have AUC = 1, by definition you have perfect classifier.

From Information retrieval viewpoint ; if you have AUC = 1 then you have perfect recall and perfect precision. You recall all documents which exists about this topic, also all the documents you recall are relevant to your topic.

I would like to add more information for response to commenter.

Following is a graph from "ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, Tom Fawcett"

Figure 2 of ROC Graphs

A discrete classifier is one that outputs only a class label. 
Each discrete classifier produces an (FP rate,TP rate) pair, 
which corresponds to a single point in ROC space. 
...
The point (0;1) represents perfect classification. 
D's performance is perfect as shown.

Solved – Area under ROC curve for random forest

Yes, but it is not relevant in practice, except some very rare cases when class order is somewhat not equivalent to the model (like in one-class SVM).

Exchanging class order simply changes AUROC from $a$ to $1-a$, so anyway your model makes so much sense as AUROC is far from .5. This way it is basically safe to report $1-a$ when $a<.5$, and many AUROC implementations will do this automatically.

Best Answer

Related Solutions

Solved – Recall and AUC of a binary classifier

Solved – Area under ROC curve for random forest

Related Question