Solved – Evaluation of classifiers: learning curves vs ROC curves

accuracyclassificationmachine learningroc

I would like to compare 2 different classifiers for a multiclass text classification problem that use large training datasets. I am doubting whether I should use ROC curves or learning curves to compare the 2 classifiers.

On one hand, learning curves are useful for deciding the size of the training dataset, since you can find the size of the dataset at which the classifier stops learning (and maybe degrades). So the best classifier in this case might be the one reaching the highest accuracy with the smallest dataset size.

On the other hand, ROC curves let you find a point with the right trade-off between between sensitivity/specificity. The best classifier in this case is just the one closer to the top-left part, with the highest TPR for any FPR.

Should I use both evaluation methods? Is it possible for a method with a better learning curve to have a worse ROC curve, and vice-versa?

Best Answer

Learning curve is only a diagnosing tool, telling you how fast your model learns and whether your whole analysis is not stuck in a quirky area of too small sets / too small ensemble (if applies). The only part of this plot that is interesting for model assessment is the end of it, i.e. the final performance -- but this does not need a plot to be reported.
Selecting a model based on a learning curve as you sketched in your question is rather a poor idea, because you are likely to select a model that is best at overfitting on a too small sample set.

About ROCs... ROC curve is a method to assess binary models that produce a confidence score that an object belongs to one class; possibly also to find them best thresholds to convert them into an actual classifiers.
What you describe is rather an idea to plot your classifiers' performance as a scatterplot of TPR/FPR in the ROC space and use closest-to-top-left-corner criterion to select this which is best balanced between generating false alarms and misses -- this particular aim can be more elegantly achieved by simply selecting model with a best F-score (harmonic mean of precision and recall).

Related Question