Mixed Model – Understanding the Intra-Class Correlation Coefficient

intraclass-correlationmixed model

Suppose we fit a simple mixed / multilevel model with no predictors, which some call a variance components model:

$$ y_{ij} = \beta_0 + u_j + e_{ij} $$

where $u_j$ are the level-2 residuals and $e_{ij}$ are the level-1 residuals, and we obtain estimates, $\hat{\sigma_u^2}$ and $\hat{\sigma_e^2}$ for the variance of $u_j$ and $e_{ij}$ respectively. I get it that

$$
\nu = \frac{\hat{\sigma_u^2}}{\hat{\sigma_u^2} +\hat{\sigma_e^2}}
$$
is the proportion of total variance in the response due to between-level-2 units, and so it makes sense that this is called the variance partition coefficient.

What I don't get is why $\nu$ can also be interpreted as the correlation between response scores for 2 randomly selected observations from within the same level-2 unit (ie the intra-class correlation). It makes sense that $\nu$ should be related to this correlation, but why should be be exactly equal ?

Best Answer

Common assumptions are that $$ \textrm{Cov}(\mathbf{u}, \mathbf{e}) = \mathbf{0} $$ $$ \textrm{Cov}(\mathbf{e}) = \sigma^2_e \mathbf{I}. $$


Let $i \neq i'$.

On the one hand, we have $$\begin{align*} \textrm{Var}(y_{ij}) & = \textrm{Var}(\beta_0 + u_j + e_{ij}) \\ & = \textrm{Var}(u_j + e_{ij}) \\ & = \textrm{Var}(u_j) + \textrm{Var}(e_{ij}) + 2 \textrm{Cov}(u_j, e_{ij})\\ & = \sigma^2_u + \sigma^2_e. \end{align*}$$

On the other hand, we have $$\begin{align*} \textrm{Cov}(y_{ij}, y_{i'j}) & = \textrm{Cov}(\beta_0 + u_j + e_{ij}, \beta_0 + u_j + e_{i'j}) \\ & = \textrm{Cov}(u_j + e_{ij}, u_j + e_{i'j}) \\ & = \textrm{Cov}(u_j, u_j) + \textrm{Cov}(u_j, e_{i'j}) + \textrm{Cov}(e_{ij}, u_j) + \textrm{Cov}(e_{ij}, e_{i'j}) \\ & = \sigma^2_u. \end{align*}$$

Hence $$\begin{align*} \textrm{Cor}(y_{ij}, y_{i'j}) & = \frac{\textrm{Cov}(y_{ij}, y_{i'j})}{\sqrt{\textrm{Var}(y_{ij})}\sqrt{\textrm{Var}(y_{i'j})}} \\ & = \frac{\sigma^2_u}{\sqrt{\sigma^2_u + \sigma^2_e} \sqrt{\sigma^2_u + \sigma^2_e}} \\ & = \frac{\sigma^2_u}{\sigma^2_u + \sigma^2_e}. \end{align*}$$

The latter is the correlation between measurement $y_{ij}$ and measurement $y_{i'j}$ ($i \neq i'$), i.e., the correlation between "any two responses having the same $j$".