Solved – Classifier performance measure that combines sensitivity and specificity

classificationmodel-evaluationrocsensitivity-specificity

I have 2-classes labelled data on which I'm performing classification using multiple classifiers. And the datasets are well balanced. When assessing the classifiers' performance, I need to take into consideration how accurate the classifier is in determining not only the true positives, but the true negatives also. Therefore, if I use accuracy, and if the classifier is biased toward positives and classifies everything as positive, I will get around 50% accuracy, even though it failed at classifying any true negatives. This property is extended to precision and recall as they focus on one class only, and in turn to F1-score. (This is what I understand even from this paper for example "Beyond Accuracy, F-score and ROC: a Family of Discriminant Measures for Performance Evaluation").

Therefore, I can use sensitivity and specificity (TPR and TNR) to see how the classifier performed for each class, where I aim to maximize these values.

My question is that I am looking for a measure that combines both these values into one meaningful measure. I looked into the measures provided in that paper, but I found it to be non-trivial. And based on my understanding I was wondering why can't we apply something like the F-score, but instead of using precision and recall I would use sensitivity and specificity? So the formula would be
$$
\text{my Performance Measure} = \frac{2 * \text{sensitivity} * \text{specificity}}{\text{sensitivity} + \text{specificity}}
$$
and my aim would be to maximize this measure. I find it to be very representative. Is there a similar formula already? And would this make sense or is it even mathematically sound?

Best Answer

I would say that there might not be any particular or only one measure which you should take into account.

Last time when I did probabilistic classification I had a R package ROCR and explicit cost values for the False Positives and False Negatives.

I considered all cutoff-points from 0 to 1 and used many measures such as expected cost when selecting this cutoff - point. Of course I had already AUC measure for the general measure of classifying accuracy. But for me this was not the only possibility.

Values for the FP and FN cases must come outside your particular model, maybe these are provided by some subject matter expert?

For example in customer churn analysis it might be more expensive to incorrectly infer that customer is not churning but also that it will be expensive to give a general reduction in prices for services without accurary to target these to correct groups.

-Analyst

Related Question