Solved – Understanding expectation maximization for simple 2 linear mixture case

machine learningmixture-distributionr

I would appreciate some help getting some EM stuff straight. So, say I generate data in R as follows:

N       <- 100
epsilon <- rnorm(N)
X       <-  10*runif(N)
beta.0  <- 10
beta.1  <-  3
sigma   <- 2
Y       <-  beta.0 + beta.1 * X + sigma * epsilon
epsilon2 <- rnorm(N)
X2 <- 10*runif(N)
Y2 <-  3 - X2 + 0.25 * epsilon2
Y.mix <- c(Y, Y2)
X.mix <- c(X, X2)

Now, in expectation maximization, in the first step, I have some prior probability, say 0.5, of the data being from either one or the other distribution. So, using EM I know I can estimate the mean and variance of the two mixtures. From looking at a density plot, it seems like the means are at about -2 and 30 for the data I simulated. But, at what stage in EM do I back out the betas? I want to recover the slope, intercept, and sd deviation parameters for the 2 regression-type equations.

Thanks for an clarification.

Best Answer

It's a mixture model set up you've got. So to start, put the mixture identifying variable in - you don't have it yet. It's an indicator variable saying whether a case comes from one regression (say Z=0) or the other (say Z=1). Probably it will enter the full model in the form of an interaction with a slope and/or intercept to allow these to change depending on which regression generates the point (although other more complex arrangements are possible). Formulate that model carefully to ensure the mixture dependencies are what you want - there are a lot of possibilities.

Now, if Z was observed you'd know how to fit the complete model and get betas from it because there would be nothing unobserved on the right hand side of it. But assuming you see only the data and the covariates, you don't observe it. However, you have assumed a complete model for how data is generated for each value of Z. So (E-step) use that to get a posterior distribution over the possible values of Z for each data point using the model with its parameters as they stand and some prior assumption about the distribution of Z (or you could estimate that too). Recall that the posterior probability of Z=1 just is the expectation of Z. Now (M-step) use that expected Z as if it was a real observation of Z to refit the whole model. The complete data likelihood will, in normal circumstances, not go down.

Alternate this process until the likelihood of the data under the model stops rising, retrieve the final set of betas, hope you're not in a local minimum, and declare that you've estimated them.