Solved – Restricted Maximum Likelihood (REML) Estimate of Variance Component

maximum likelihoodmixed modelmultivariate analysisoptimizationpanel data

Let, $$\mathbf y_i = \mathbf X_i\mathbf\beta + \mathbf Z_i\mathbf b_i+ \mathbf\epsilon_i,$$

where

$\mathbf y_i\sim N(\mathbf X_i\mathbf\beta, \Sigma_i=\sigma^2\mathbf I_{n_i}+\mathbf Z_i \mathbf G\mathbf Z_i'),$

$\mathbf b_i\sim N(\mathbf 0, \mathbf G),$

$\mathbf\epsilon_i\sim N(\mathbf 0, \sigma^2\mathbf I_{n_i}),$

$\mathbf y_i$ is a $n_i\times 1$ vector of response for $i^{th}$ individual at $1,2,\ldots, n_i$ time points, $\mathbf X_i$ is a $n_i\times p$ matrix, $\mathbf \beta$ is a $p\times 1$ vector of fixed effect parameters, $\mathbf Z_i$ is a $n_i\times q$ matrix, $\mathbf b_i$ is a $q\times 1$ vector of random effects, $\mathbf \epsilon_i$ is a $n_i\times 1$ vector of within errors, $\mathbf G$ is a $q\times q$ covariance matrix of between-subject, $\sigma^2$ is a scalar.

Note that, $\mathbf X_i$, $\mathbf Z_i$, and $\mathbf G$ do NOT involve $\sigma^2$.

Now I have to find out the Restricted Maximum Likelihood (REML) Estimate of $\sigma^2$, that is,

$$\hat\sigma^2_R = \frac{1}{N_0-p}\sum_{i=1}^{N}(\mathbf y_i-\mathbf X_i\mathbf\beta)'(\mathbf I_{n_i}+\mathbf Z_i \mathbf G\mathbf Z_i')^{-1}(\mathbf y_i-\mathbf X_i\mathbf\beta),\ldots (1)$$

where $N_0 = \sum_{i=1}^{N}n_i$.

So first I wrote the Restricted Maximum Log-Likelihood :

$$l_R \propto -\frac{1}{2}\sum_{i=1}^{N}\log\det(\Sigma_i)-\frac{1}{2}\sum_{i=1}^{N}\log\det(\mathbf X_i'\Sigma_i^{-1}\mathbf X_i)-\frac{1}{2}\sum_{i=1}^{N}(\mathbf y_i-\mathbf X_i\mathbf\beta)'\Sigma_i^{-1}(\mathbf y_i-\mathbf X_i\mathbf\beta).$$

Then I have to differentiate log-likelihood, $l_R$, with respect to $\sigma^2$ and equate it to zero, i.e.,

$-\frac{1}{2}\frac{\partial}{\partial\sigma^2}\{\sum_{i=1}^{N}\log\det(\Sigma_i)+\sum_{i=1}^{N}\log\det(\mathbf X_i'\Sigma_i^{-1}\mathbf X_i)+\sum_{i=1}^{N}(\mathbf y_i-\mathbf X_i\mathbf\beta)'\Sigma_i^{-1}(\mathbf y_i-\mathbf X_i\mathbf\beta)\}|_{\sigma^2=\hat\sigma^2_R}=0.$

But basically I can't differentiate,

$\frac{\partial}{\partial\sigma^2}\log\det(\Sigma_i)=\frac{\partial}{\partial\sigma^2}\log\det(\sigma^2\mathbf I_{n_i}+\mathbf Z_i \mathbf G\mathbf Z_i')$

$\frac{\partial}{\partial\sigma^2}\log\det(\mathbf X_i'\Sigma_i^{-1}\mathbf X_i)=\frac{\partial}{\partial\sigma^2}\log\det(\mathbf X_i'(\sigma^2\mathbf I_{n_i}+\mathbf Z_i \mathbf G\mathbf Z_i')^{-1}\mathbf X_i)$ and

$\frac{\partial}{\partial\sigma^2}\Sigma_i^{-1}= \frac{\partial}{\partial\sigma^2}(\sigma^2\mathbf I_{n_i}+\mathbf Z_i \mathbf G\mathbf Z_i')^{-1}$.

Here is an answer to differentiate such derivative but I can't combine it to get the REML estimate $\hat\sigma^2_R$ in equation $(1)$.

How can I get the REML estimate $\hat\sigma^2_R$ in equation $(1)$?

Best Answer

NB. I simplify notation somewhat and do not use bold typesetting.


The following rules for matrix differentials are useful:

\begin{align} d\log \vert A\vert &= \mathrm{tr}(A^{-1}dA) \\ dA^{-1} & = -A^{-1}(dA)A^{-1}. \end{align}

A good source for such rules and how to derive them is Magnus and Neudecker. That text also explains how to go between differentials and derivatives. You can also refer to this document by Minka. The Matrix Cookbook is another standard reference for such rules. In addition to derivatives, this answer also makes repeated use of cyclic invariance and linearity of the trace operator.

Based on these rules we get the following expressions for the differentials in question: $$ d\Sigma_i^{-1} = -\Sigma_i^{-1}d\Sigma_i \Sigma_i^{-1} = -\Sigma^{-2}d\sigma^2, $$ $$ d\log \vert \Sigma_i \vert = \mathrm{tr}\Sigma_i^{-1}d\Sigma_i = \mathrm{tr}\Sigma_i^{-1}Id\sigma^2, $$

and

\begin{align} d\log\vert X_i'\Sigma_i^{-1}X_i\vert &= \mathrm{tr}[(X_i'\Sigma_iX)^{-1}X_i'd\Sigma_i^{-1}X_i] \\ &= -\mathrm{tr}[(X_i'\Sigma_iX)^{-1}X_i'(\Sigma_i^{-1}d\Sigma_i \Sigma_i^{-1})X_i]\\ &= -\mathrm{tr}[\Sigma_i^{-1}X_i(X_i'\Sigma_iX)^{-1}X_i'\Sigma_i^{-1}Id\sigma^2] \end{align}

Thus, the gradient of the log-restricted likelihood up to scaling and additive terms not involving $\sigma^2$ is:

\begin{align} \sum_i \frac{\partial}{\partial \sigma^2}\left[-\log \vert \Sigma_i \vert - \log \vert X_i'\Sigma_i^{-1}X_i \vert - (y_i - X_i\beta)'\Sigma_i^{-1}(y_i - X_i\beta) \right] = \\ \sum_i \mathrm{tr}\left[-\Sigma_i^{-1} + [\Sigma_i^{-1}X_i(X_i'\Sigma_i^{-1}X)^{-1}X_i'\Sigma_i^{-1}] + (y_i - X_i \beta)'\Sigma_i^{-2}(y_i - X_i\beta)\right]. \end{align}

Since the $\Sigma_i$ are (I assume) positive definite, and in particular full rank, setting the gradient to zero is equivalent to: $$ \sum_i \mathrm{tr}\left[I - [X_i(X_i'\Sigma_i^{-1}X)^{-1}X_i'\Sigma_i^{-1}] - \Sigma_i^{-1}(y_i - X_i\beta)(y_i - X_i \beta)'\right] = 0. $$

Now notice that $Q_i := I - [X_i(X_i'\Sigma_i^{-1}X)^{-1}X_i'\Sigma_i^{-1}]$ is a projection matrix onto the orthogonal complement of the column space of $X_i$. Thus, its trace equals the dimension of the space onto which it is projecting: $n_i - p$. Factoring out $\sigma^2$ then gives the REML estimate of $\sigma^2$ as the solution to $$\sigma^2 = \sum_i (y_i - X\beta)'[I + Z_iGZ_i' / \sigma^2]^{-1}(y_i - X_i\beta) / (N_0 - p),$$

assuming the optimization problem is convex.

This is different from your expression. It's possible I made a mistake somewhere and, in that case, hopefully you or another reader can spot it. After reading some more though, my impression is that mixed models are commonly parameterized in terms of $\tilde{G} = G/\sigma^2$. Parameterizing the first order condition derived here in terms of $\tilde{G}$ would make it agree with your first order condition.

Related Question