R – Intraclass Correlation (ICC) for Interaction

intraclass-correlationlme4-nlmer

Suppose I have some measurement for each subject at each site. Two variables, subject and site, are of interest in terms of computing intraclass correlation (ICC) values. Typically I would use function lmer from R package lme4, and run

lmer(measurement ~ 1 + (1 | subject) + (1 | site), mydata)

The ICC values can be obtained from the variances for the random effects in the above model.

However, I recently read a paper that really puzzles me. Using the above example, the authors calculated three ICC values in the paper with function lme from nlme package: one for subject, one for site, and one for the interaction of subject and site. No further details were given in the paper. I'm confused from the following two perspectives:

  1. How to calculate the ICC values with lme? I don't know how to specify those three random effects (subject, site, and their interaction) in lme.
  2. Is it really meaningful to consider the ICC for the interaction of subject and site? From modeling or theoretical perspective, you can calculate it, but conceptually I have trouble interpreting such an interaction.

Best Answer

The R model formula

lmer(measurement ~ 1 + (1 | subject) + (1 | site), mydata)

fits the model

$$ Y_{ijk} = \beta_0 + \eta_{i} + \theta_{j} + \varepsilon_{ijk} $$

where $Y_{ijk}$ is the $k$'th measurement from subject $i$ at site $j$, $\eta_{i}$ is the subject $i$ random effect, $\theta_{j}$ is the site $j$ random effect and $\varepsilon_{ijk}$ is the leftover error. These random effects have variances $\sigma^{2}_{\eta}, \sigma^{2}_{\theta}, \sigma^{2}_{\varepsilon}$ that are estimated by the model. (Note that if subject is nested within site, you would traditionally write $\theta_{ij}$ here instead of $\theta_{j}$).

To answer your first question regarding how to calculate the ICCs: under this model, the ICCs are the proportion of the total variation explained by the respective blocking factor. In particular, the correlation between two randomly selected observations on the same subject is:

$$ {\rm ICC}({\rm Subject}) = \frac{\sigma^{2}_{\eta}}{\sigma^{2}_{\eta}+ \sigma^{2}_{\theta}+\sigma^{2}_{\varepsilon}}$$

The correlation between two randomly selected observations from the same site is:

$$ {\rm ICC}({\rm Site}) = \frac{\sigma^{2}_{\theta}}{\sigma^{2}_{\eta}+ \sigma^{2}_{\theta}+\sigma^{2}_{\varepsilon}}$$

The correlation between two randomly selected observations on the same individual, and at the same site (the so-called interaction ICC) is:

$$ {\rm ICC}({\rm Subject/Site \ Interaction}) = \frac{\sigma^{2}_{\eta}+\sigma^{2}_{\theta}}{\sigma^{2}_{\eta}+ \sigma^{2}_{\theta}+\sigma^{2}_{\varepsilon}}$$

It seems you were confused by this being referred to as an "interaction" since it's the sum of individual terms. It's an "interaction" in the sense that it estimates the ${\rm ICC}$ corresponding to the blocking factor composed on the combination of Subject and site - it's important to note that you do not have to include some kind of "interaction" term between the factors to estimate this quantity.

Each of these quantities can be estimated by plugging in the estimates of these variances that come out of the model fitting.

Regarding your second question - as you can see here, each ${\rm ICC}$ has a fairly clear interpretation. I would argue that the interaction ${\rm ICC}$ does tell us something interesting - how "similar" are measurements that share both subject and site?

One important point to note is that if subjects are nested within sites, then the Subject ${\rm ICC}$ is not meaningful in it's own right, since it's impossible to share Subject and not site. Then $\sigma^{2}_{\eta}$ becomes only a measure of how much more similar individuals are to themselves, compared to other individuals at their site.