[Math] Maximum Likelihood Estimator for Multivariate Bernoulli

calculusmachine learningprobabilityprobability distributionsstatistics

I am working on deriving Naive Bayes for document classification.

Each document is represented by a binary vector $x^i$ where $i=1,..,N$ for N documents. In this vector a cell is set to 1 if that cell representing a word is present at least once in the document, and left zero otherwise. Let's say there are 50,000 words hence each binary vector has 50,000 elements.

multivar bernoulli for document words

The joint distribution for Naive Bayes is

$p(x_1,…,x_{50000}) = \prod_{d=1}^{50000} p(x_d)=\prod_{d=1}^{50000} \alpha_d^{x_d}(1-\alpha_d)^{1-x_d}$

Likelihood is

$L(\theta) = \prod_{i=1}^N \prod_{d=1}^{50000} p(x_d^i) = \prod_{i=1}^N \prod_{d=1}^{50000} \alpha_d^{x_d^i}(1-\alpha_d)^{1-x_d^i}$

The text I am reading suggests maximum likelihood solution for $\alpha_d$ is $\alpha_d = \frac{N_d}{N}$, where $N_d$ is the total of '1's for a dimension (word) across all documents, and N is the total number of documents. I am guessing this is obtained by taking derivative of the likelihood function, setting the result to zero, then solve for $\alpha_d$. One trick is I guess taking log of both sides, but even then, the algebra gets hairy pretty fast. Maybe I am missing something else. I would appreciate if someone could help with this derivation.

Best Answer

With the loglikelihood

$$LL(\alpha_1,\ldots,\alpha_{50000}) = \sum_{i=1}^N \sum_{d=1}^{50000} {x_d^i}\log(\alpha_d)+(1-x_d^i)\log(1-\alpha_d) \; ,$$

it's pretty easy, you get the following equations to solve:

$$\sum_{i=1}^N\left(\frac{x_d^i}{\alpha_d}-\frac{1-x_d^i}{1-\alpha_d}\right) = 0$$

or after summing over $i$

$$\frac{N_d}{\alpha_d}-\frac{N-N_d}{1-\alpha_d} = 0 \; .$$

Related Question