Solved – Question on how to use EM to estimate parameters of this model

bayesianexpectation-maximization

I am trying to understand EM and trying to infer parameters of this model using this technique but am having trouble understanding how to begin:

So, I have a weighted linear regression model as follows where I have observations $X = (x_i, x_2….x_n)$ and the corresponding observations $Y = (y_1, y_2….y_n)$. The model of the relationship between $X$ and $Y$ is a weighted linear regression model and the distributional assumptions are as follows:

$$
y_i \sim \mathcal{N}(\beta^Tx_i, \frac{\sigma^2}{w_i})
$$
$$
\beta \sim \mathcal{N}(0, \Sigma_\beta)
$$
$$
w_i \sim \mathcal{G}(a, b)
$$

Here $\beta$ are the regression parameters and the model allows for unequal variances by having the response variables to have individual weights on the variance. My goal is to find the most likely linear relationship given by the parameters $\beta$.

So, I can now write the log-posterior as following:

$$
\log P(Y, \beta, w|X) = \sum_{i=1}^n \big(\log P(y_i|x_i, \beta, w_i) + \log P(w_i)\big) + log P(\beta)
$$

Now, I have been trying to understand EM and am not sure that my understanding is yet complete but as I understand it, in order to start estimating the parameters, I start by taking the expectation of the log-posterior distribution $\log P(Y, \beta, w|X)$ with respect to the latent/hidden parameters which in my case are $\beta$ and $w$. So this required expected value will be:

$$
\int\int P(\beta, w | X) * \log P(Y, \beta, w | X) dw \;d\beta
$$

However, I have no idea how to proceed from here to compute this expectation. Would greatly appreciate any suggestions on what the next step should be. I am not looking for someone to derive me all the necessary things but just a nudge in the right direction on what should I be looking to solve in the next steps.

Best Answer

Let me recall the basics of the EM algorithm first. When looking for the maximum likelihood estimate of a likelihood of the form$$\int f(x,z|\beta)\text{d}z,$$ the algorithm proceeds by iteratively maximising (M) expected (E) complete log-likelihoods, which results in maximising (in $\beta$)at iteration $t$ the function $$Q(\beta|\beta_i)=\int \log f(x,z|\beta) f(z|x,\beta_t)\text{d}z$$ The algorithm must therefore starts by identifying the latent variable $z$ and its conditional distribution.

In your case it seems that the latent variable is $\varpi$ made of the $w_i$'s while the parameter of interest is $\beta$. If you process both $\beta$ and $\varpi$ as latent variables there is no parameter left to optimise. However, this also means that the prior on $\beta$ is not used.

If we look more precisely at the case of $w_i$, its conditional distribution is given by$$f(w_i|x_i,y_i,\beta)\propto\sqrt{w_i}\exp\left\{-w_i(y_i-\beta^Tx_i)^2/2\sigma^2\right\}\times w_i^{a-1}\exp\{-bw_i\}$$ which qualiifies as a $$\mathcal{G}\left(a+1/2,b+(y_i-\beta^Tx_i)^2/2\sigma^2\right)$$distribution.

The completed log-likelihood being$$\sum_i \frac{1}{2}\left\{\log(w_i)- w_i(y_i-\beta^Tx_i)^2/\sigma^2\right\}$$ the part that depends on $\beta$ simplifies as$$-\sum_iw_i(y_i-\beta^Tx_i)^2/2\sigma^2$$and the function $-Q(\beta|\beta_t)$ is proportional to \begin{align*}\mathbb{E}\left[\sum_iw_i(y_i-\beta^Tx_i)^2\Big|X,Y,\beta_t\right]&=\sum_i\mathbb{E}[w_i|X,Y,\beta_t](y_i-\beta^Tx_i)^2\\&=\sum_i\frac{a+1/2}{b+(y_i-\beta_t^Tx_i)^2/2\sigma^2}(y_i-\beta^Tx_i)^2\end{align*} Maximising this function in $\beta$ amounts to a weighted linear regression, with weights $$\frac{a+1/2}{b+(y_i-\beta_t^Tx_i)^2/2\sigma^2}$$

Related Question