I often use a ROC curve and the area under that curve as a measure of classifier accuracy in 2-class problems, e.g:
#Load a dataset
library(mlbench)
data(Sonar)
#Build a model
library(caret)
model <- train(Class~., data=Sonar, method='gbm', tuneLength=1, trControl=trainControl(method='cv'))
model
#ROC curve and AUC
library(pROC)
pMal <- predict(model, newdata=Sonar, type='prob')[,2]
roc(Sonar$Class, pMal, plot=TRUE)
>Area under the curve: 0.9705
#Lorez curve and gini?
In a similar manner, I would like to be able to plot the lorenz curve and calculate the gini coefficient for my classifier. I know Gini = 2*AUC-1
, but I'm not actually sure how to calculate it on it's own. Furthermore, every application of a lorenz curve I've seen looks at univariate data (e.g. income distribution). How do I calculate a lorenz curve when I have 2 parameters: the predicted probability of the positive class, and the positive class itself?
Best Answer
Lorenz curve is also known under the name of "lift curve" when applied to classification/ranking. For a given range of predicted probability values, the lift represents a multiplicative increase in the positive class's rate (due to a given predictive model) over a random guess.
rocr package can calculate lift values/curves (the manual also has a concise definition of the lift). The Gini index can be calculated from the area under the lift curve (I typically use cumulative lift value at a given predicted probability threshold instead since it is easier to relate to business metrics)