Gaussian Mixture Model and the mean

gaussianmathematical modelingoptimizationprobabilitystatistics

there. I have a question regarding the mixture model, more specifically, in the Gaussian setting. $$f(x) = \alpha f_1(x) + (1-\alpha) f_2(x)$$

I'm thinking about the EM algorithm, which is known as the fastest way to work out the parameters of model, and also the weights associated with each components. It is used to estimate each component's mean, variance, and also the $\alpha$, which is the probability assigned to each component.

Here, I was told to use EM to estimate the overall mean of the mixture distribution.

What confuses me is about the mean of this mixture, which is $\alpha \mu_1 + (1-\alpha) \mu_2$ (for simplicity just assume it has two component distribution). Wouldn't $\overline{X}$, the simple mean of samples from this mixture model, be a reasonable estimator of the mean? If that's case, then why is EM even bothered to be used here?

To sum up, suppose we only want to known about the mean of a mixture distribution, is $\overline{X}$ a reasonable estimator for the overall mean?

Best Answer

It would work. $E(\bar X)=E(X)=\mu$ is unbiased for the mixture mean.

Advantages of EM are to estimate the unknown parameters and assign predicted cluster labels to each data point. The EM algorithm works on missing data problems, so here the missing data is the mixture that each data point belongs to and EM is used for that.