Solved – How to calculate precision and recall when some of the test data remains unclassified

classificationmachine learning

Consider a situation where we are running a classifier (the actual classification algorithm doesn't matter here), and the class labels are given based on a score. If score > 0, the data point is labeled A, if score < 0, the data point is labeled B.

All the training data contains data points with positive or negative scores. However, in my test data, there are a few points that return score = 0. How should I measure the precision and recall in this scenario where some points can't be classified into any class?


[additional information from a comment below] I faced this in a sentiment classification task. The normalized scores are in the range [-1,1], with 0 being the score for documents with no sentiment. It so happened that I had no neutral documents in my training data, but in the test data, some documents returned a score of 0.

Best Answer

It's useful to keep in mind that precision/recall are inherently tied to a particular state or label of interest. In information retrieval that label might be "relevant" as opposed to "not relevant," whereas in cancer that label might be "malignant" as opposed to "benign."

As @Thomas Jungblut mentions, it would be valid to treat this not as a binary classification problem ("A" or "B") but instead as a multiclass classification problem ("A," "B," or "Unclassified"). There are other metrics besides precision/recall that can be of interest in multiclass classification. However, if you insist on precision/recall then you must pick your label of interest and then this sort of becomes de facto binary classification once again. You have various options for how to frame the problem ("A" vs "B or unclassified" is not the same as "A or unclassified" vs "B", etc.). However, effectively these are the same as simply picking a default label.

Since you seem to impart special meaning to the classification score of 0, it seems that perhaps it would be appropriate to also apply some domain knowledge or some knowledge of the specific classification algorithm being used. In the general case there's nothing really magical about a score of 0, but perhaps you really have a specific problem in mind where this is not the case.