Solved – Expectation-Maximization Algorithm for Binomial

binomial distributionexpectation-maximizationlatent-variablemultinomial-distribution

I have a multinomial distribution with four outcomes, with a pdf:

$$p(x_1,x_2,x_3,x_4)=\frac{n!}{x_1!x_2!x_3!x_4!}p_1^{x_1}p_2^{x_2}p_3^{x_3}p_4^{x_4}, \sum_{i=1}^4x_i=n, \sum_{i=1}^4p_i=1$$

The probabilities are related to the single parameter $0\le\theta\le 1$
\begin{align*}
p_1&=\frac{1}{2}+\frac{1}{4}\theta\\
p_2&=\frac{1}{4}-\frac{1}{4}\theta\\
p_3&=\frac{1}{4}-\frac{1}{4}\theta\\
p_4&=\frac{1}{4}\theta
\end{align*}

If we have an observation $x=(x_1,x_2,x_3,x_4)$, the log-likelihood is:

$$l(\theta)=x_1\log(2+\theta)+(x_2+x_3)\log(1-\theta)+x_4\log(\theta)+c$$

Also, I will suppose $x=(125,18,20,34).$

1) I start by finding the MLE of $\theta$ by simply maximizing its log-likelihood. I took the derivative of the log-likelihood with respect to $\theta$ and set it equal to zero:
\begin{align*}
\frac{x_1}{2+\theta}-\frac{x_2+x_3}{1-\theta}+\frac{x_4}{\theta}&=0
\\
\frac{125}{2+\theta}-\frac{38}{1-\theta}+\frac{34}{\theta}&=0
\\
197\theta^2-15\theta-68 &=0
\end{align*}
Using the quadratic formula I get: $\theta \in \{0.6268, -0.5507\}$.
$\theta$ can't be negative here, and the values $\theta\to 0$ and $\theta\to 1$ do not approach minima, so the MLE of $\theta$ is $\hat\theta=0.6268$.

I do not know if I was able to successfully drive the MLE by
maximizing its log-likelihood. If it is correct, I also do not know if
it would be appropriate to leave it in this "plus or minus" form.

2) Now I would like to compare what I got in (1) with the EM algorithm. To do so, I will consider a multinomial with five classes formed from the original multinomial by splitting the first class into two with probabilities $\frac{1}{2}$ and $\frac{\theta}{4}$. The original variable $x_1$ is split into $x_1=x_{11}+x_{12}$. Now, we have a MLE of $\theta$ by considering $x_{12}+x_4$ to be a realization of a binomial with $n=x_{12}+x_4+x_2+x_3$ and $p=\theta$. However, we do not know $x_{12}$, and the complete data log-likelihood is:

$$l_c(\theta)=(x_{12}+x_4)\log(\theta)+(x_2+x_3)\log(1-\theta)$$

I would like to develop an EM algorithm for estimating $\theta$. I am told that I should be able to combine the E-step and M-step together, i.e., $\hat\theta^{(t+1)}$ can be expressed in terms of $\hat\theta^{(t)}$.

I am very stuck with conceptualizing how to approach this problem. I
have read about the EM algorithm (including the famous Nature article
(Numerical example to understand Expectation-Maximization)).

All I can say right now is that the binomial must follow:

${n \choose k}p^k(1-p)^{(n-k)}$

and $n=x_{12}+72$

But I am very lost at what I would do for the expectation and maximization steps! In fact, I want to implement this in R, and all I can get out is:

x = c(0, 18, 20, 34)  
t=.5  
l = (x12+x[4])*log(t)+(x[2]+x[3])*log(1-t)  

And already I am stuck because I do not have x12 and I am guessing
that I should start t (theta) at a default value of 0.5. Any advice on
how to start the algorithm, and how to perform E and M steps would
really help me!

Best Answer

Extracted from our book, Monte Carlo Statistical Methods (except for the sentence in italics):

A classic (perhaps overused) example of the EM algorithm is the genetics problem (see Rao (1973), Dempster, Laird and Rubin (1977)), where observations $(x_1,x_2,x_3,x_4)$ are gathered from the multinomial distribution $$ \mathfrak{M}\left( n;{1\over 2} + {\theta \over 4}, {1\over 4}(1-\theta), {1\over 4}(1-\theta), {\theta \over 4}\right). $$ Estimation is easier if the $x_1$ cell is split into two cells, so we create the augmented model$$ (z_1,z_2,x_2,x_3,x_4) \sim \mathfrak{M}\left( n;{1\over 2}, {\theta \over 4}, {1\over 4}(1-\theta), {1\over 4}(1-\theta), {\theta \over 4}\right), $$ with $x_1=z_1+z_2$. [This means that $(Z_1,Z_2)$ is a Multinomial $\mathfrak{M}\left( x_1; {1\over 2}, {\theta \over 4}\right)$, therefore that$$\mathbb{E}[Z_2|X_1=x_1]=\dfrac{\theta/4}{1/2+\theta/4}x_1=\dfrac{\theta}{2+\theta}x_1\,.\Big]$$ The complete-data likelihood function is then simply $\theta^{z_2+x_4} (1-\theta)^{x_2+x_3}$, as opposed to the observed-data likelihood function $(2+\theta)^{x_1}\theta^{x_4} (1-\theta)^{x_2+x_3}$. The expected complete log-likelihood function is \begin{eqnarray*} &&\mathbb{E}_{\theta_0} [ (Z_2+x_4) \log\theta + (x_2+x_3) \log (1-\theta) ] \\ &&\qquad = \left( {\theta_0 \over 2+\theta_0}x_1 + x_4 \right) \log\theta + (x_2+x_3) \log (1-\theta) \;, \end{eqnarray*} which can easily be maximized in $\theta$, leading to $$ \hat\theta_1 = \frac{\displaystyle{{\theta_0\,x_1\over 2+\theta_0}} + x_4}{\displaystyle{{\theta_0\,x_1\over 2+\theta_0}} +x_2+x_3+x_4} \;. $$

Related Question