Solved – Harmonic mean of precision, recall and specificity

precision-recallsensitivity-specificity

I have a system whose performance was characterized in terms of the amount of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). There is a rate that summarizes all this quantities in one, called accuracy, and is defined like so (according to Wikipedia):
$$
a=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}}.
$$
How ever, according to this Microsoft's blog, the F-score is a better choice than accuracy because it works even if I the class distribution is uneven. F-score is the harmonic mean of precision and recall, but it does not include the amount of true negatives (TN):
$$
F = \frac{2\text{TP}}{2\text{TP}+\text{FP}+\text{FN}} = \frac{2}{\frac{1}{p}+\frac{1}{r}}.
$$
To consider the amount of TN could I calculate the mean of the precision, recall and specificity like below?
$$
u = \frac{3}{\frac{1}{p}+\frac{1}{r}+\frac{1}{s}}
$$
How is that called? And why it haven't been used before?

Note: this question is not the same as the one discussed here, because I am asking why F-measure does not include specificity (and if I can include it), not why accuracy is not a good metric for an uneven number of positives and negatives.


Trying to answer my own question I see that:
$$
u=\frac{(3\text{TN}\,\text{TP})}{(3\text{TN}\,\text{TP})+\text{TN}\,\text{FN}+\text{TN}\,\text{FP}+\text{TP}\,\text{FP}},
$$
which I do not know how to interpret. Perhaps I should start using the Matthews correlation coefficient, given that Wikipedia says it is one of the best measurements to describe the confusion matrix (and "can be used even if the classes are of very different sizes"). After all, I was looking to include TP, TN, FP and FN in a single quantity.

Best Answer

Well i think instead of adding precision, recall and specificity, you can only use harmonic mean of specificity and sensitivity, that will help you capture all four TP, TN, FP and FN in a single quantity.
The harmonic mean between specificity and sensitivity referred as F-measure.
And when it comes to interpretation F1score doesn't have any interpretation, please correct me if i am wrong. and i think MCC gives you +1, 0 and -1, which is of no use if you are using it for evaluating a machine learning model while training or testing.