Mixed Model – Intraclass Correlation Coefficient with Random Slopes

intraclass-correlationlme4-nlmemixed model

I have the following model m_plot fitted with lme4::lmer with crossed random effects for participants (lfdn) and items (content):

Random effects:
 Groups   Name             Variance Std.Dev. Corr                                     
 lfdn     (Intercept)      172.173  13.121                                            
          role1             62.351   7.896    0.03                                    
          inference1        24.640   4.964    0.08 -0.30                              
          inference2        52.366   7.236   -0.05  0.17 -0.83                        
          inference3        21.295   4.615   -0.03  0.22  0.86 -0.77                  
 content  (Intercept)       23.872   4.886                                            
          role1              2.497   1.580   -1.00                                    
          inference1        18.929   4.351    0.52 -0.52                              
          inference2        14.716   3.836   -0.16  0.16 -0.08                        
          inference3        17.782   4.217   -0.17  0.17  0.25 -0.79                  
          role1:inference1   9.041   3.007    0.10 -0.10 -0.10 -0.21  0.16            
          role1:inference2   5.968   2.443   -0.60  0.60 -0.11  0.78 -0.48 -0.50      
          role1:inference3   4.420   2.102    0.30 -0.30  0.05 -0.97  0.71  0.37 -0.90
 Residual                  553.987  23.537                                            
Number of obs: 3480, groups:  lfdn, 435 content, 20

I want to know the Intraclass Correlation Coefficients (ICC) for participants and items.
Thanks to this great answer I in principle know how to get the ICC for my model. However, I am unsure on whether or not to include the random slopes or not:

vars <- lapply(summary(m_plot)$varcor, diag)
resid_var <- attr(summary(m_plot)$varcor, "sc")^2
total_var <- sum(sapply(vars, sum), resid_var)

# with random slopes
sapply(vars, sum)/total_var
##       lfdn    content 
## 0.33822396 0.09880349

# only random intercepts:
sapply(vars, function(x) x[1]) / total_var
##   lfdn.(Intercept) content.(Intercept) 
##         0.17496587          0.02425948

What is the appropriate measure for the correlation between two responses from the same participant respective to the same item?

Best Answer

Basically there's no single number or estimate that can summarize the degree of clustering in a random slopes model.

The intra-class correlation (ICC) can only be written as a simple proportion of variances in random-intercepts-only models. To see why, a sketch of the derivation of the ICC expression can be found here.

When you throw random slopes into the model equation, following the same steps leads instead to the ICC expression on page 5 of this paper. As you can see, that complicated expression is a function of the predictor X. To see more intuitively why var(Y) depends on X when there are random slopes, check out page 30 of these slides ("Why does the variance depend on x?").

Because the ICC is a function of the predictors (the x-values), it can only be computed for particular sets of x-values. You could perhaps try something like reporting the ICC at the joint average of the x-values, but this estimate will be demonstrably inaccurate for the majority of the observations.

Everything I've said still only refers to cases where there is a single random factor. With multiple random factors it becomes even more complicated. For example, in a multi-site project where participants at each site respond to a sample of stimuli (i.e., 3 random factors: site, participant, stimulus), we could ask about many different ICCs: What is the expected correlation between two responses at the same site, to the same stimulus, from different participants? How about at different sites, the same stimulus, and different participants? And so on. @rvl mentions these complications in the answer that the OP linked to.

So as you can see, the only case where we can summarize the degree of clustering with a single value is the single-random-factor random-intercept-only case. Because this is such a small proportion of real-world cases, ICCs are not that useful most of the time. So my general recommendation is to not even worry about them. Instead I recommend just reporting the variance components (preferably in standard deviation form).

Fit the model

m<-MCMCglmm(cbind(x,y)~trait-1,
#trait-1 gives each variable a separate intercept
        random=~us(trait):group,
#the random effect has a separate intercept for each variable but allows and estiamtes the covariance between them.
        rcov=~us(trait):units,
#Allows separate residual variance for each trait and estimates the covariance between them
        family=c("gaussian","gaussian"),prior=p,data=df)

In the model summary summary(m) the G structure describes the variance and covariance of the random intercepts. The R structure describes the observation level variance and covariance of intercept, which function as residuals in MCMCglmm.

If you are of a Bayesian persuasion you can get the entire posterior distribution of the co/variance terms m$VCV. Note that these are variances after accounting for the fixed effects.

simulate data

library(MASS)
n<-3000

#draws from a bivariate distribution
df<-data.frame(mvrnorm(n,mu=c(10,20),#the intercepts of x and y
                   Sigma=matrix(c(10,-3,-3,2),ncol=2)))
#the residual variance covariance of x and y


#assign random effect value
number_of_groups<-100
df$group<-rep(1:number_of_groups,length.out=n)
group_var<-data.frame(mvrnorm(number_of_groups, mu=c(0,0),Sigma=matrix(c(3,2,2,5),ncol=2)))
#the variance covariance matrix of the random effects. c(variance of x,
#covariance of x and y,covariance of x and y, variance of y)

#the variables x and y are the sum of the draws from the bivariate distribution and the random effect
df$x<-df$X1+group_var[df$group,1]
df$y<-df$X2+group_var[df$group,2]

Estimating the original co/variance of the random effects requires a large number of levels to the random effect. Instead your model will likely estimate the observed co/variances which can be calculated by cov(group_var)

Solved – Intraclass correlation coefficient, non parametric data

Depends on the method - if you're using the ANOVA (general linear) method, then yes. But you can also calculate the ICC using generalized linear methods (I'm not sure about non-parametric methods - anyone else?).

Nakagawa and Schielzeth wrote the user-friendly rptR package for R, which uses general or generalized linear mixed effects modelling (for normal, binomial and count data) to calculate repeatability within classes using ANOVA, REML or MCMC methods. I think this is what you want to do?

http://rptr.r-forge.r-project.org/

Best Answer

Related Solutions

Solved – Intraclass Correlation Coefficients (ICC) with Multiple Variables

Fit the model

simulate data

Solved – Intraclass correlation coefficient, non parametric data

Related Question