Solved – Why does nobody use the Bayesian multinomial Naive Bayes classifier

bayesiandirichlet distributionmultinomial-distributionnaive bayesprior

So in (unsupervised) text modeling, Latent Dirichlet Allocation (LDA) is a Bayesian version of Probabilistic Latent Semantic Analysis (PLSA). Essentially, LDA = PLSA + Dirichlet prior over its parameters. My understanding is that LDA is now the reference algorithm and is implemented in various packages, while PLSA should not be used anymore.

But in (supervised) text categorization, we could do exactly the same thing for the multinomial Naive Bayes classifier and put a Dirichlet prior over the parameters. But I don't think I have ever seen anyone do that, and the "point estimate" version of multinomial Naive Bayes seems to be the version implemented in most packages. Is there any reason for that?

Best Answer

Here is a nice paper that addresses some of the 'systemic' shortcomings of the Multinomial Naive Bayes (MNB) classifier. The idea is that you can boost the performance of MNB through some tweaks. And they do mention using (uniform) Dirichlet priors.

Overall if you're interested in MNB and you haven't read this paper yet, I would strongly recommend to do so.

I also found an accompanying MSc thesis by the same person / people but haven't read it myself yet. You can check it out.