Solved – Fail to improve recall in classification

classificationcross-validationnaive bayesprecision-recall

I have a large data set with over 700,000 examples and I tried to (binary) classify the data set with Naive Bayes and Random Forest. The task was carried out in Python and Scikit-learn

data

The data set has 3 categorical variables and 5 discrete (numeric) variables. I use One hot encoder to discretize the categorical variables and the data set consists of 18 features (13 dummy variables + 5 discrete variables).

Overall, 20% examples are positive (1); 80% are negative (0).

evaluation

I used MultiNomialNB and RandomForestClassifier in Scikit-learn to classify the data set. I applied K-Fold (k=20) cross validation to the data set.

The results of both classifier were of little variance in all metrics: precision, recall, F-measure, accuracy and AUC.

  • AUC: both over 0.7
  • accuracy: both over 0.8
  • precision: both 0.5
  • recall: both 0.28

Since I am more interested in the positive, I worried about the precision and recall, which were really low.

Consequently, I tweaked the hyperparameters of the classifiers.

  • For Naive Bayes, the only parameter alpha had no effect at all since the size of examples were so big.
  • For Random Forest, I changed the number of estimators from 5 to 30, and the number of max features. But the precision and recall did not exceed 0.28.

question

What is the probable reason for such low recall and precision? How could I improve recall?

Best Answer

You're looking for reasons why your precision and recall are low, but your accuracy doesn't look that great either. A classifier that labels every example a zero would have an accuracy of 80%, which is close to what yours is.

I've built very few models in my professional life that were as predictive as I wanted them to be. There's a lot left unstated in the problem statement, but I'd venture that there is uncertainty that no amount of parameter tweaking is going to solve. Finding new or better features may be of more help, but that can be hard or expensive work.

Otherwise, I think you should play with the threshold that allows you to trade off recall for precision (within a single fitted model). If you're okay with "leaving money on the table," see what happens when you adjust the threshold such that the recall is 5%. Perhaps your precision will shoot up to 75% and there will still be enough positive classified cases for your needs. Or perhaps the situation is reversed and you're okay with lots of false positives, whereas you'd want to do the opposite.