Solved – Why Naive Bayes classifier is known to be a bad estimator

classificationmachine learningnaive bayes

In scikit-learn documentation page for Naive Bayes, it states that: On the flip side, although naive Bayes is known as a decent classifier, it is known to be a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.

I want to use Naive Bayes for a classification problem of two categorical dataset, and I'm interested to know why the output of predict_proba could not be taken as a accurate prediction? The reason why I'm asking this question is that, my ultimate purpose after training this dataset is to use this trained model to predict the probabilities on a test dataset and if it does not have a good accuracy, so it seems it would not be a good fit for my case. I appreciate if someone could explain it.

Best Answer

Because it's Naive (see wikipedia): it assumes the features are independent, the probabilities are incorrect if this assumption is not correct. eg assume you are predicting mortality based on smoking and drinking. NB may well identify people who smoke and drink as higher risk (=probability), just because there is a correlation between smoking and drinking. Suggest you setup a contingency table for eg smoking, drinking, dying and calcuate Naive bayes. probabilities vs true probabilities

why don't you use logistic regression instead?