Solved – Why does nobody use the Bayesian multinomial Naive Bayes classifier

bayesiandirichlet distributionmultinomial-distributionnaive bayesprior

So in (unsupervised) text modeling, Latent Dirichlet Allocation (LDA) is a Bayesian version of Probabilistic Latent Semantic Analysis (PLSA). Essentially, LDA = PLSA + Dirichlet prior over its parameters. My understanding is that LDA is now the reference algorithm and is implemented in various packages, while PLSA should not be used anymore.

But in (supervised) text categorization, we could do exactly the same thing for the multinomial Naive Bayes classifier and put a Dirichlet prior over the parameters. But I don't think I have ever seen anyone do that, and the "point estimate" version of multinomial Naive Bayes seems to be the version implemented in most packages. Is there any reason for that?

Best Answer

Here is a nice paper that addresses some of the 'systemic' shortcomings of the Multinomial Naive Bayes (MNB) classifier. The idea is that you can boost the performance of MNB through some tweaks. And they do mention using (uniform) Dirichlet priors.

Overall if you're interested in MNB and you haven't read this paper yet, I would strongly recommend to do so.

I also found an accompanying MSc thesis by the same person / people but haven't read it myself yet. You can check it out.

Related Solutions

LDA – Natural Interpretation for LDA Hyperparameters

David Blei has a great talk introducing LDA to students of a summer class: http://videolectures.net/mlss09uk_blei_tm/

In the first video he covers extensively the basic idea of topic modelling and how Dirichlet distribution come into play. The plate notation is explained as if all hidden variables are observed to show the dependencies. Basically topics are distributions over words and document distributions over topics.

In the second video he shows the effect of alpha with some sample graphs. The smaller alpha the more sparse the distribution. Also, he introduces some inference approaches.

Solved – How to simulate a multivariate Logistic-Normal distribution in Python

Unfortunately, scipy.stats doesn't provide the logistic normal distribution. However, you could draw random samples from a multivariate normal distribution (e.g. using numpy) and transform them with a logistic transformation to simulate samples drawn from the logistic-normal distribution.

Let's assume your probability vectors are $D=3$ dimensional.

import numpy as np

# draw from multivariate random distribution of dimension D-1 = 2
mean = (1, 2)
cov = [[1, 0], [0, 1]]
y = np.random.multivariate_normal(mean, cov)

Now (as you can read here), you can transform your normally distributed sample $y \in \mathcal{S}^{D-1}$ to a logistic-normally distributed sample $x \in \mathcal{S}^{D}$:

$$ \mathbf{y} = \left[ \log \left( \frac{ x_1 }{ x_D } \right) , \dots , \log \left( \frac{ x_{D-1} }{ x_D } \right) \right] $$

$$ \mathbf{x} = \left[ \frac{ e^{ y_1 } }{ 1 + \sum_{i=1}^{D-1} e^{ y_i } } , \dots , \frac{ e^{ y_{D-1} } }{ 1 + \sum_{i=1}^{D-1} e^{ y_i } } , \frac{ 1 }{ 1 + \sum_{i=1}^{D-1} e^{ y_i } } \right] $$

Best Answer

Related Solutions

LDA – Natural Interpretation for LDA Hyperparameters

Solved – How to simulate a multivariate Logistic-Normal distribution in Python

Related Question