Bayesian Methods – Relationship Between MAP, EM, and MLE Explained

bayesianexpectation-maximizationmachine learningmaximum likelihood

I am a beginner in machine learning. I can do programming fine but the theory confuses me a lot of the times.

What is the relation between Maximum Likelihood Estimation (MLE), Maximum A posteriori (MAP) estimate, and Expectation-Maximization (EM) algorithm?

I see them used as the methods that actually do the optimization.

Best Answer

Imagine that you have some data $X$ and probabilistic model parametrized by $\theta$, you are interested in learning about $\theta$ given your data. The relation between data, parameter and model is described using likelihood function

$$ \mathcal{L}(\theta \mid X) = p(X \mid \theta) $$

To find the best fitting $\theta$ you have to look for such value that maximizes the conditional probability of $\theta$ given $X$. Here things start to get complicated, because you can have different views on what $\theta$ is. You may consider it as a fixed parameter, or as a random variable. If you consider it as fixed, then to find it's value you need to find such value of $\theta$ that maximizes the likelihood function (maximum likelihood method [ML]). On another hand, if you consider it as a random variable, then this means that it also has some distribution, so you need to make one more assumption about prior distribution of $\theta$, i.e. $p(\theta)$, and you will be using Bayes theorem for estimation

$$ p(\theta \mid X) \propto p(X \mid \theta) \, p(\theta) $$

If you are not interested in estimating the posterior distribution of $\theta$ but only about point estimate that maximizes the posterior probability, then you will be using maximum a posteriori (MAP) method for estimating it.

As about expectation-maximalization (EM), it is an algorithm that can be used in maximum likelihood approach for estimating certain kind of models (e.g. involving latent variables, or in missing data scenarios).

