Solved – The form of the Log-Likelihood Function in Mixed Linear Models

likelihoodlinear algebramatrixmaximum likelihoodmixed model

Let us assume the following mixed effects model:

$y = X\beta+Zu+e$

where $y$ is a vector of n observable random variables, $\beta$ is a vector of $p$ fixed effects, $X$ and $Z$ are known matrices, and $u$ and $e$ re vectors of $q$ and $n$ random effects such that $E(u) = 0$ and $E(e) = 0$ and

$ Var \begin{bmatrix}
u \\
e \\
\end{bmatrix} =
\begin{bmatrix}
G & 0 \\
0 & R \\
\end{bmatrix}\sigma^2$

where $G$ and $R$ are known positive definite matrices and $\sigma^2$ is a positive constant. I wonder what is the form of the log likelihood function in the different "cases". Is it as follows?

Case 1: I want to estimate the fixed effects $\beta$

In that case, I take the derivative of the following log-likelihood with respect to $\beta$ (ignoring an additive constant):

(1) $LL(\beta, \sigma^2) = \frac 12|V|-\frac 12(y-X\beta)'V^{-1}(y-X\beta)$

where

$V = ZGZ + R$.

Obviously, $\beta = (X'V^{-1}X)^{-1}X'V^{-1}y $, which is the weighted least-squares estimator. The weight matrix can be estimated using full information maximum likelihood or restricted maximum likelihood.

Case 2: I want to estimate random effects

The log-likelihood of all the parameters is based on the joint density of $(y; u)$. It is therefore (ignoring an additive constant):

(2) $LL(\beta, \sigma^2, u) = \frac 12|R|-\frac 12(y-X\beta-Zu)'R^{-1}(y-X\beta-Zu)-\frac 12|G|-\frac 12u'G^{-1}u$

Taking the derivative and setting ot equal to zero provides us with the estimate of the random effects $u = (Z'R^{1}Z+G^{-1})^{-1}Z'R^{-1}(y-X\hat\beta)$, which is the best linear unbiased predictor (BLUP).

In short, why do we sometimes need to use the joint density to determine the log linear? Do we need the joint density only when we want to estimate not only fixed but also random effects? However, if we are not interested in the random effects but only the fixed effects, then the form in (1) suffices?

Best Answer

In standard maximum likelihood theory the likelihood function is based on observed data. Because the random effects are unobserved random variables, you need to integrate them out. When you do that you obtain the log-likelihood function you have written in Case 1.

The random effects are often estimated in a second step using empirical Bayes methodology. In particular, they are estimated via the posterior distribution

$$p(u \mid y; \hat \theta) = \frac{p(y \mid u; \hat \theta) \, p(u; \hat \theta)}{p(y; \hat \theta)},$$

where $p(\cdot)$ denotes the corresponding probability density functions, and $\hat \theta$ the maximum likelihood estimates obtained from the log-likelihood function of Case 1. Note that this is a whole distribution. As estimates of the random effects we use the mode of this distribution with respect to $u$. And here is where the joint likelihood of Case 2 comes into play. That is, the mode of the posterior distribution of the random effects is the mode of the joint density of Case 2 (the numerator in the equation above) because the normalizing constant in the denominator is not a function of $u$.

Related Question