Solved – Intraclass correlation standard error

intraclass-correlationlme4-nlmemixed modelr

I am in the last stage of my paper review and it is very close to getting submitted. However, there is a comment I am highly struggling with:

The reviewer wants to know what the ICC and the s.e. of the ICC is.

The model I have is a linear mixed effect model looking like:

mixedmodel <- lmer(y ~ x1 + x2 + x3 + (1 | countries), weights=weight, Alldata)

Before I had never heard about the ICC. When looking for documentation, I do find information on ICC, but not on how to calculate the s.e. of the ICC.

Can anyone please help me with this? I am using R.

Best Answer

You can easily get the ICC from the sjstats-package, which also offers bootstrapping. I would try bootstrapping to obtain se- and p-values (you can also take a look at this discussion in the r-sig-mailinglist for other approaches, but bootstrapping is one recommendation, two).

library(sjstats) # for ICC and bootstrapping
library(dplyr)   # for the pipe
library(lme4)

# compute ICC
fit <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
icc(fit)
> Linear mixed model
>  Family: gaussian (identity)
> Formula: Reaction ~ Days + (Days | Subject)
>
>   ICC (Subject): 0.483090

# compute SE of ICC
dummy <- sleepstudy %>% 
  # generate 100 bootstrap replicates of dataset
  bootstrap(100) %>% 
  # run mixed effects regression on each bootstrap replicate
  mutate(models = lapply(.$strap, function(x) {
    lmer(Reaction ~ Days + (Days | Subject), data = x)
  })) %>% 
  # compute ICC for each "bootstrapped" regression
  mutate(icc = unlist(lapply(.$models, icc)))

# now compute SE and p-values for the bootstrapped ICC
boot_se(dummy, icc)
>   term   std.err
> 1  icc 0.1024704

boot_p(dummy, icc)
>   term      p.value
> 1  icc 1.936526e-08

The above code takes the sleepstudy dataset and creates 100 bootstrap-samples, which are saved as "list-variable" named strap, in a new data frame.
This list-variable (i.e. the data frame with this variable), containing the 100 bootstrap-samples, is then "piped" into the lapply-command to run 100 lmer-commands (which are saved in another list-variable named models, see mutate-command).
This list-variable is then used to compute the ICC with the sjstats::icc function, so you have 100 ICC-values (for all models applied to the 100 bootstrap-samples) in the variable named icc.

Now boot_se and boot_p compute the standard error and p-value of any bootstrapped values, in this case the ICC-values.

Fit the model

m<-MCMCglmm(cbind(x,y)~trait-1,
#trait-1 gives each variable a separate intercept
        random=~us(trait):group,
#the random effect has a separate intercept for each variable but allows and estiamtes the covariance between them.
        rcov=~us(trait):units,
#Allows separate residual variance for each trait and estimates the covariance between them
        family=c("gaussian","gaussian"),prior=p,data=df)

In the model summary summary(m) the G structure describes the variance and covariance of the random intercepts. The R structure describes the observation level variance and covariance of intercept, which function as residuals in MCMCglmm.

If you are of a Bayesian persuasion you can get the entire posterior distribution of the co/variance terms m$VCV. Note that these are variances after accounting for the fixed effects.

simulate data

library(MASS)
n<-3000

#draws from a bivariate distribution
df<-data.frame(mvrnorm(n,mu=c(10,20),#the intercepts of x and y
                   Sigma=matrix(c(10,-3,-3,2),ncol=2)))
#the residual variance covariance of x and y


#assign random effect value
number_of_groups<-100
df$group<-rep(1:number_of_groups,length.out=n)
group_var<-data.frame(mvrnorm(number_of_groups, mu=c(0,0),Sigma=matrix(c(3,2,2,5),ncol=2)))
#the variance covariance matrix of the random effects. c(variance of x,
#covariance of x and y,covariance of x and y, variance of y)

#the variables x and y are the sum of the draws from the bivariate distribution and the random effect
df$x<-df$X1+group_var[df$group,1]
df$y<-df$X2+group_var[df$group,2]

Estimating the original co/variance of the random effects requires a large number of levels to the random effect. Instead your model will likely estimate the observed co/variances which can be calculated by cov(group_var)

R – Intraclass Correlation (ICC) for Interaction

The R model formula

lmer(measurement ~ 1 + (1 | subject) + (1 | site), mydata)

fits the model

$$ Y_{ijk} = \beta_0 + \eta_{i} + \theta_{j} + \varepsilon_{ijk} $$

where $Y_{ijk}$ is the $k$'th measurement from subject $i$ at site $j$, $\eta_{i}$ is the subject $i$ random effect, $\theta_{j}$ is the site $j$ random effect and $\varepsilon_{ijk}$ is the leftover error. These random effects have variances $\sigma^{2}_{\eta}, \sigma^{2}_{\theta}, \sigma^{2}_{\varepsilon}$ that are estimated by the model. (Note that if subject is nested within site, you would traditionally write $\theta_{ij}$ here instead of $\theta_{j}$).

To answer your first question regarding how to calculate the ICCs: under this model, the ICCs are the proportion of the total variation explained by the respective blocking factor. In particular, the correlation between two randomly selected observations on the same subject is:

$$ {\rm ICC}({\rm Subject}) = \frac{\sigma^{2}_{\eta}}{\sigma^{2}_{\eta}+ \sigma^{2}_{\theta}+\sigma^{2}_{\varepsilon}}$$

The correlation between two randomly selected observations from the same site is:

$$ {\rm ICC}({\rm Site}) = \frac{\sigma^{2}_{\theta}}{\sigma^{2}_{\eta}+ \sigma^{2}_{\theta}+\sigma^{2}_{\varepsilon}}$$

The correlation between two randomly selected observations on the same individual, and at the same site (the so-called interaction ICC) is:

$$ {\rm ICC}({\rm Subject/Site \ Interaction}) = \frac{\sigma^{2}_{\eta}+\sigma^{2}_{\theta}}{\sigma^{2}_{\eta}+ \sigma^{2}_{\theta}+\sigma^{2}_{\varepsilon}}$$

It seems you were confused by this being referred to as an "interaction" since it's the sum of individual terms. It's an "interaction" in the sense that it estimates the ${\rm ICC}$ corresponding to the blocking factor composed on the combination of Subject and site - it's important to note that you do not have to include some kind of "interaction" term between the factors to estimate this quantity.

Each of these quantities can be estimated by plugging in the estimates of these variances that come out of the model fitting.

Regarding your second question - as you can see here, each ${\rm ICC}$ has a fairly clear interpretation. I would argue that the interaction ${\rm ICC}$ does tell us something interesting - how "similar" are measurements that share both subject and site?

One important point to note is that if subjects are nested within sites, then the Subject ${\rm ICC}$ is not meaningful in it's own right, since it's impossible to share Subject and not site. Then $\sigma^{2}_{\eta}$ becomes only a measure of how much more similar individuals are to themselves, compared to other individuals at their site.

Best Answer

Related Solutions

Solved – Intraclass Correlation Coefficients (ICC) with Multiple Variables

Fit the model

simulate data

R – Intraclass Correlation (ICC) for Interaction

Related Question