Solved – Using micro average vs. macro average vs. normal versions of precision and recall for a binary classifier

I have a logistic regression recommender model built on my data where I tried to predict one of two outcomes for each row. Let's call them success and fail. I'm using the cross_val_score function of SKLearn to do so. I was planning on just using precision and recall as performance measures, but saw the "micro" and "macro" options and decided to read up more and try those as well, but I'm confused by the results. For reference, I looked at this blog post, this SO question, and this paper.

While I think I understand the general concept, I'm confused by the results I'm getting, and which one is ideal for my use case. One important piece of background is that I have fewer success cases that fail. success makes up 25% of my dataset.

My understanding is that micro is a value that's closer to the performance of the model on the larger class (in this case, fail), while macro is for the smaller one (success). This makes sense. I'm trying to predict successes in this case, so the latter should be the better choice.

Here are the parts I don't get:

a) How do each of these compare to just regular precision / recall, in terms of representing the larger or smaller classes? Which of the three options is optimal for a case like this where I want to surface the smaller class but not the large one.

b) In some folds of my CV, I get 0 as the regular precision value, but none are 0 for micro and macro. How is that possible? If I get some true positives in the individual classes, shouldn't I always have some in the overall dataset as well? In general the regular precision and recall end up significantly lower than both types of averages.

Best Answer

These micro and macro averaging techniques are typically made for situations with more than 2 classes. With two classes you would only compute precision, recall, F1 or whatever you are interested in (you tell us! We cannot know) with regard to almost always your minority class (since it almost always is the one you're interested in).

If these averaging techniques use arithmetic means (on the micro or macro level), the other non 0 value will make the average larger than 0 as in $(0 + 5)/2=2.5$. If they used geometric averages (which is very uncommon), then $\sqrt{0\times5}=0$

Best Answer

Related Solutions

Solved – Precision recall curve for nearest neighbor classifier

Solved – Understanding Precision and Recall Results on a Binary Classifier

Related Question