Solved – Maximum likelihood of coin toss of different type

expectation-maximizationlikelihoodmaximum likelihoodprobability

I was self-studying EM (Expectation Maximization) algorithm, where I came across this example given by the paper. In this paper, there are two types of coins A, B with unknown parameters $θ_A$ and $θ_B$. $θ_A$ is: it is probability that it will land on head if coin of type A is chosen.

There are five experiments, each time choosing a type at random and tossing the coin 10 times.

My problem is I can't figure out the exact calculation to determine the maximum likelihood estimation of both biases.
The MLE for $θ_A$ is: $\frac{(number of heads using coin A)}{(total number of flips using coin A)}$

Also in every other examples of MLE I saw some unknown parameters were present in the conditional probability and we try to maximize the likelihood function with respect to that. For example see this example:

If the $X_i$ are independent Bernoulli random variables with unknown parameter $p$, then the probability mass function of each $X_i$ is:
$$
f(x_i|p)=p^{x_i} (1−p)^{1−x_i}
$$
for $x_i = 0 \text{or 1 and $0 < p < 1$}$. Therefore, the likelihood function $L(p)$ is, by definition:
$$
L(p) =\prod f(x_i;p) \\
L(p)=p^{∑x_i} (1−p)^{n−∑x_i}
$$
So given the dataset $x_i$ we can maximize this equation w.r.t p.

But what is the parameter of that coin toss example.

Best Answer

If for each of the five experiments you know which coin is used, A or B, there is no difficulty as, indeed, $\theta_A$ is estimated by the proportion of heads among throws with coin A and the same for $\theta_B$. The problem gets more difficult if the coin is not recorded at each of the five experiments. To quote from the linked reference from Nature Biotechnology (2008)

"Now consider a more challenging variant of the parameter estimation problem in which we are given the recorded head counts x but not the identities z of the coins used for each set of tosses. We refer to z as hidden variables or latent factors."

In this case, the number of heads of ten coins is either a ${\cal B}(10,\theta_A)$ binomial with probability $1/2$ or a ${\cal B}(10,\theta_B)$ binomial with probability $1/2$. This means that, if $x_1,\ldots,x_5$ are the five results, the associated likelihood is $$\prod_{i=1}^5\left\{ \frac{1}{2}{10 \choose x_i}\theta_A^{x_i}(1-\theta_A)^{10-x_i}+\frac{1}{2}{10 \choose x_i}\theta_B^{x_i}(1-\theta_B)^{10-x_i}\right\}$$which is proportional to $$\prod_{i=1}^5\left\{\theta_A^{x_i}(1-\theta_A)^{10-x_i}+\theta_B^{x_i}(1-\theta_B)^{10-x_i}\right\}$$and a nuisance to maximise directly.

A resolution by the EM algorithm consists in starting from a guess for $(\theta_A,\theta_B)$, e.g., $(\theta_A,\theta_B)=(0.6,0.5)$ and allocating each of the five experiments to one of the two coins, with probabilities (E-step)$$p_A^i=\theta_A^{x_i}(1-\theta_A)^{10-x_i}\Big/\left\{\theta_A^{x_i}(1-\theta_A)^{10-x_i}+\theta_B^{x_i}(1-\theta_B)^{10-x_i}\right\}$$and$$p_B^i=\theta_B^{x_i}(1-\theta_B)^{10-x_i}\Big/\left\{\theta_A^{x_i}(1-\theta_A)^{10-x_i}+\theta_B^{x_i}(1-\theta_B)^{10-x_i}\right\}$$leading to the (M-step) update $$\theta_A^\text{new}=\sum_{i=1}^5p_A^i x_i \Big/ \sum_{i=1}^5p_A^i \{x_i+10-x_i\}= \sum_{i=1}^5p_A^i x_i \Big/ 10 \sum_{i=1}^5p_A^i$$and$$\theta_B^\text{new}=\sum_{i=1}^5p_B^i x_i \Big/ 10 \sum_{i=1}^5p_B^i$$ for instance $(\theta_A^\text{new},\theta_B^\text{new})=(0.71,0.58)$, to iterate until convergence...

An interesting remark in the tutorial is that EM was..

"Introduced as early as 1950 by Ceppellini et al. in the context of gene frequency estimation"

much earlier than the Dempster et al. JRSS B paper.

Related Question