The package MCMCglmm can easily handle and estimate covariance structures and random effects. However it does use bayesian statistics which can be intimidating to new users. See the MCMCglmm Course Notes for a thorough guide to MCMCglmm, and chapter 5 in particular for this question. I absolutely recommend reading up on assessing model convergence and chain mixing before analysing data for real in MCMCglmm.
library(MCMCglmm)
MCMCglmm uses priors, this is an uninformative inverse wishart prior.
p<-list(G=list(
G1=list(V=diag(2),nu=0.002)),
R=list(V=diag(2),nu=0.002))
Fit the model
m<-MCMCglmm(cbind(x,y)~trait-1,
#trait-1 gives each variable a separate intercept
random=~us(trait):group,
#the random effect has a separate intercept for each variable but allows and estiamtes the covariance between them.
rcov=~us(trait):units,
#Allows separate residual variance for each trait and estimates the covariance between them
family=c("gaussian","gaussian"),prior=p,data=df)
In the model summary summary(m)
the G structure describes the variance and covariance of the random intercepts. The R structure describes the observation level variance and covariance of intercept, which function as residuals in MCMCglmm.
If you are of a Bayesian persuasion you can get the entire posterior distribution of the co/variance terms m$VCV
. Note that these are variances after accounting for the fixed effects.
simulate data
library(MASS)
n<-3000
#draws from a bivariate distribution
df<-data.frame(mvrnorm(n,mu=c(10,20),#the intercepts of x and y
Sigma=matrix(c(10,-3,-3,2),ncol=2)))
#the residual variance covariance of x and y
#assign random effect value
number_of_groups<-100
df$group<-rep(1:number_of_groups,length.out=n)
group_var<-data.frame(mvrnorm(number_of_groups, mu=c(0,0),Sigma=matrix(c(3,2,2,5),ncol=2)))
#the variance covariance matrix of the random effects. c(variance of x,
#covariance of x and y,covariance of x and y, variance of y)
#the variables x and y are the sum of the draws from the bivariate distribution and the random effect
df$x<-df$X1+group_var[df$group,1]
df$y<-df$X2+group_var[df$group,2]
Estimating the original co/variance of the random effects requires a large number of levels to the random effect. Instead your model will likely estimate the observed co/variances which can be calculated by cov(group_var)
The R model formula
lmer(measurement ~ 1 + (1 | subject) + (1 | site), mydata)
fits the model
$$ Y_{ijk} = \beta_0 + \eta_{i} + \theta_{j} + \varepsilon_{ijk} $$
where $Y_{ijk}$ is the $k$'th measurement
from subject
$i$ at site
$j$, $\eta_{i}$ is the subject $i$ random effect, $\theta_{j}$ is the site $j$ random effect and $\varepsilon_{ijk}$ is the leftover error. These random effects have variances $\sigma^{2}_{\eta}, \sigma^{2}_{\theta}, \sigma^{2}_{\varepsilon}$ that are estimated by the model. (Note that if subject is nested within site, you would traditionally write $\theta_{ij}$ here instead of $\theta_{j}$).
To answer your first question regarding how to calculate the ICCs: under this model, the ICCs are the proportion of the total variation explained by the respective blocking factor. In particular, the correlation between two randomly selected observations on the same subject is:
$$ {\rm ICC}({\rm Subject}) = \frac{\sigma^{2}_{\eta}}{\sigma^{2}_{\eta}+ \sigma^{2}_{\theta}+\sigma^{2}_{\varepsilon}}$$
The correlation between two randomly selected observations from the same site is:
$$ {\rm ICC}({\rm Site}) = \frac{\sigma^{2}_{\theta}}{\sigma^{2}_{\eta}+ \sigma^{2}_{\theta}+\sigma^{2}_{\varepsilon}}$$
The correlation between two randomly selected observations on the same individual, and at the same site (the so-called interaction ICC) is:
$$ {\rm ICC}({\rm Subject/Site \ Interaction}) = \frac{\sigma^{2}_{\eta}+\sigma^{2}_{\theta}}{\sigma^{2}_{\eta}+ \sigma^{2}_{\theta}+\sigma^{2}_{\varepsilon}}$$
It seems you were confused by this being referred to as an "interaction" since it's the sum of individual terms. It's an "interaction" in the sense that it estimates the ${\rm ICC}$ corresponding to the blocking factor composed on the combination of Subject
and site
- it's important to note that you do not have to include some kind of "interaction" term between the factors to estimate this quantity.
Each of these quantities can be estimated by plugging in the estimates of these variances that come out of the model fitting.
Regarding your second question - as you can see here, each ${\rm ICC}$ has a fairly clear interpretation. I would argue that the interaction ${\rm ICC}$ does tell us something interesting - how "similar" are measurements that share both subject and site?
One important point to note is that if subjects are nested within sites, then the Subject
${\rm ICC}$ is not meaningful in it's own right, since it's impossible to share Subject
and not site
. Then $\sigma^{2}_{\eta}$ becomes only a measure of how much more similar individuals are to themselves, compared to other individuals at their site
.
Best Answer
You can easily get the ICC from the sjstats-package, which also offers bootstrapping. I would try bootstrapping to obtain se- and p-values (you can also take a look at this discussion in the r-sig-mailinglist for other approaches, but bootstrapping is one recommendation, two).
The above code takes the
sleepstudy
dataset and creates 100 bootstrap-samples, which are saved as "list-variable" namedstrap
, in a new data frame.This list-variable (i.e. the data frame with this variable), containing the 100 bootstrap-samples, is then "piped" into the
lapply
-command to run 100lmer
-commands (which are saved in another list-variable namedmodels
, seemutate
-command).This list-variable is then used to compute the ICC with the
sjstats::icc
function, so you have 100 ICC-values (for all models applied to the 100 bootstrap-samples) in the variable namedicc
.Now
boot_se
andboot_p
compute the standard error and p-value of any bootstrapped values, in this case the ICC-values.