Expectation-Maximization – Calculating Expected Log Likelihood for Mixture Components with Differing Support

expectation-maximizationfittingmixture-distribution

I was hoping to use the EM algorithm to fit a mixture model in which the mixture components can have differing support. I've run into a problem during the M step because the expected log-likelihood can be infinite:

$$
\sum_{i=1}^N\mathrm{E}_{\theta_2}[\log p(x_i; \theta_1, \theta_2)] = \sum_{i = 1}^N\sum_{k = 1}^K \theta_{2,k} \log p_k(x_i; \theta_1)
$$

Because the support of the components varies, it's possible that the log-likelihood of the $k$-th component $\log p_k(x_i; \theta_1) = -\infty$.

Is there any hope of using the EM algorithm for this model?

Best Answer

I think there is some confusions about (a) the log-likelihood considered in the conditional expectation [which should be the complete likelihood] and (b) the current (fixed) parameter $\theta^{)t)}$ used in the E-step versus the (free) parameter $\theta$ to be derived by optimisation in the M-step as the new $\theta^{(t+1)}$:

If one looks at the E step for a mixture model, the complete(d) likelihood is $$L(\theta;\mathbf{x},\mathbf{z}) = p(\mathbf{x},\mathbf{z} \mid \theta) = \prod_{i=1}^n \prod_{j=1}^k \ [f(\mathbf{x}_i;\theta_j] ^{\mathbb{I}(z_i=j)}$$ where $\mathbf{z}=(z_1,\ldots,z_n)$ is the vector of latent variables [aka component indicators]. Therefore, $$\log L(\theta;\mathbf{x},\mathbf{z}) = \left\{ \sum_{i=1}^n \sum_{j=1}^k \mathbb{I}(z_i=j)\log f(x_i;\theta_j)\right\}$$ and (E-step) $$\mathbb E_{\theta^{(t)}} \left[ \log L(\theta;\mathbf{x},\mathbf{Z}) | \mathbf{x} \right] = \sum_{i=1}^n \sum_{j=1}^k \operatorname{P}(Z_i=j \mid X_i=\mathbf{x}_i ;\theta^{(t)}) \log f(x_i;\theta_j)\tag{1}$$ (Note that there are two different $\theta$ vectors in the above expression, as it is crucial for understanding the EM algorithm.) Hence, if the current value of $\theta^{(t)}$ is such that $$f(x_i;\theta_j^{(t)})=0$$ then $$\operatorname{P}(Z_i=j \mid X_i=\mathbf{x}_i ;\theta^{(t)})=0$$ and this term $\log f(x_i;\theta_j)$ does not appear in (1). If on the opposite $$\operatorname{P}(Z_i=j \mid X_i=\mathbf{x}_i ;\theta^{(t)})>0$$ then $\log f(x_i;\theta_j)$ appears in (1) and the maximisation of (1) cannot result in a value such that $\log f(x_i;\theta_j) = -\infty$