Solved – numerical solution to a mixture model of two normal distributions

gaussian mixture distributionnormal distribution

I'm building a mixture model with the two normal distributions
$\mathcal{N}(\mu_1,\sigma_{1}^{2})$ and $\mathcal{N}(\mu_2,\sigma_{2}^{2})$.
So, the density function is
$$
f(x) = p_1 N(x; \mu_1, \sigma_1^2) + p_2 N(x; \mu_2, \sigma_2^2),
$$
where $p_1+p_2=1$, and
$$
N(x;\mu,\sigma) = \frac{1}{\sqrt{2\pi \sigma^2}}\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\}.
$$.

Suppose I have all the sampling data, is there some numerical solution or formula that could derive $p_1$, $\mu_1$, $\sigma_1$ and $p_2$, $\mu_2$, $\sigma_2$?

Best Answer

The approach depends on whether the sampling data includes or not an indicator variable that specifies from which normal distribution each observation is issued.

If the data includes this indicator variable you might simply split the data in two sub-samples corresponding to the distribution from which the data originates, and fit the two normal distribution separately using maximum likelihood. The parameters $p_1$ and $p_2$ can be estimated by the proportion of samples that come respectively from the first and second normal distribution.

If the data doesn't include this indicator variable, which is most common in practice, then you might use the Expectation-Maximization (EM) algorithm. The classical example with a mixture of two normal distributions is explained here.