Solved – EM for Mixtures of Bernoulli (M-step)

binomial distributionexpectation-maximizationmixture-distributionoptimization

When applying the M-step for a mixture of Bernoulli distributions, one of the parameters in our maximization is the Bernoulli parameter $\mu_{k}$, where $k$ is the index of the "mixture component", and
$$
p(x|\mu_k) = \prod_{i=1}^D \mu_{ki}^{x_{i}}(1-\mu_{ki})^{(1-x_{i})}.
$$ In our maximization with respect to this parameter, we get the following expression
$$
\begin{align}
\frac{\partial}{\partial \mu_{ki}}\mathbb{E}_{Z}[\ln p(X, Z | \mu, \pi) &= \sum_{n=1}^N \langle z_{nk} \rangle \left( \frac{x_{ni}}{\mu_{ki}} – \frac{1 – x_{ni}}{1 – \mu_{ki}} \right) \\
&= \frac{\sum_n \langle z_{nk} \rangle x_{ni} – \sum_n \langle z_{nk} \rangle \mu_{ki} }{\mu_{ki}(1-\mu_{ki})}
\end{align}
$$
where
$$
\langle z_{nk} \rangle = p(z_{nk} | x_n, \mu_k, \pi_k)
$$
Obviously, setting this to zero and solving for $\mu_{ki}$, we get the standard solution
$$
\mu_{ki} = \frac{\sum_n \langle z_{nk} \rangle x_{ni}}{\sum_n \langle z_{nk} \rangle}
$$

With that in mind, my question is as follows. Isn't there a constraint on $\mu$ such that $\sum_i \mu_{ki} = 1$? If so, then why is this not included in the maximization; i.e. why don't we formulate the Lagrangian which includes the term
$
\lambda \left(\sum_i \mu_{ki} – 1\right)
$?

Best Answer

You would need that constraint if you were working with a multinomial distribution. Multinomial distribution are used when you have K possible outcomes. They are coded in a 1-of-K fashion, that is, a vector such that only one of its component is non-zero (usually 1). In that case, $$ p(x|\mu) = \prod_{k=1}^{K}\mu_{k}^{x_{k}} $$ which implies that, $$ \sum_{x}p(x|\mu) = \sum_{k}\mu_{k} = 1 $$

Now, you have a vector ,$x$, such that each component follows a Bernoulli distribution, that is, they are independent from each other. There you code whether a given feature is present or not.