Solved – Optimize a classification algorithm using mean per-class accuracy

accuracymachine learningrregressionscoring-rules

I have a binary classification problem and am trying to find a way to optimize my machine learning algorithm using a performance metric based on the per-class error rate. If I'm not misinterpreting anything, frequently used measures such as sensitivity and specificity focus on one of the two columns in the confusion matrix (optimizing for the rate of true "positives" or false "negatives", if you will), but what I'm looking for is a performance metric that doesn't care about "positives" and "negatives" but simply about the ability of the algorithm to correctly identify any data point as belonging to class 1 or class 2.

I'm being deliberately vague in the description above since I'm having a hard time finding anything useful on this using my (admittedly newbie-ish) Google-fu. Are there any strong theoretical reasons not to use this kind of heuristic to optimize a classification algorithm, or is it just that I'm using the wrong terminology when searching for this?

To be more specific, I'm trying to train a GLM in R using the caret package and want to construct a trainControl function call that would fulfill the criterion above. I am aware about the ability to call train() with the argument metric = "ROC", but I don't really feel that that would do the trick.

It should be stressed, though, that I'm not interested in the underpinnings of any particular model; I could just as well use a GBM, a Neural Network or any other kind of classifier to do the heavy work. I'm just interested in finding out how to tell caret (or H2O, or Spark, or whatever I can call from R really) to use the "average per-class error" as the optimization metric.

Best Answer

"Average per-class error is usually not a good metric. For instance, it is not a continuous function of model parameters, so is difficult to optimize. There is also more statistical reasons, it is not a proper score function. Search this site for "proper score function": https://stats.stackexchange.com/search?q=proper+score+function For other reasons why it is not a good idea, see Logistic regression: maximum likelihood vs misclassification

In most cases, it is more informative to see a problem with a binary variable response as risk (that is, probability) estimation, not as classification. For discussion see Logistic regression: maximum likelihood vs misclassification

Related Question