I have the following model m_plot
fitted with lme4::lmer
with crossed random effects for participants (lfdn
) and items (content
):
Random effects:
Groups Name Variance Std.Dev. Corr
lfdn (Intercept) 172.173 13.121
role1 62.351 7.896 0.03
inference1 24.640 4.964 0.08 -0.30
inference2 52.366 7.236 -0.05 0.17 -0.83
inference3 21.295 4.615 -0.03 0.22 0.86 -0.77
content (Intercept) 23.872 4.886
role1 2.497 1.580 -1.00
inference1 18.929 4.351 0.52 -0.52
inference2 14.716 3.836 -0.16 0.16 -0.08
inference3 17.782 4.217 -0.17 0.17 0.25 -0.79
role1:inference1 9.041 3.007 0.10 -0.10 -0.10 -0.21 0.16
role1:inference2 5.968 2.443 -0.60 0.60 -0.11 0.78 -0.48 -0.50
role1:inference3 4.420 2.102 0.30 -0.30 0.05 -0.97 0.71 0.37 -0.90
Residual 553.987 23.537
Number of obs: 3480, groups: lfdn, 435 content, 20
I want to know the Intraclass Correlation Coefficients (ICC) for participants and items.
Thanks to this great answer I in principle know how to get the ICC for my model. However, I am unsure on whether or not to include the random slopes or not:
vars <- lapply(summary(m_plot)$varcor, diag)
resid_var <- attr(summary(m_plot)$varcor, "sc")^2
total_var <- sum(sapply(vars, sum), resid_var)
# with random slopes
sapply(vars, sum)/total_var
## lfdn content
## 0.33822396 0.09880349
# only random intercepts:
sapply(vars, function(x) x[1]) / total_var
## lfdn.(Intercept) content.(Intercept)
## 0.17496587 0.02425948
What is the appropriate measure for the correlation between two responses from the same participant respective to the same item?
Best Answer
Basically there's no single number or estimate that can summarize the degree of clustering in a random slopes model.
The intra-class correlation (ICC) can only be written as a simple proportion of variances in random-intercepts-only models. To see why, a sketch of the derivation of the ICC expression can be found here.
When you throw random slopes into the model equation, following the same steps leads instead to the ICC expression on page 5 of this paper. As you can see, that complicated expression is a function of the predictor X. To see more intuitively why var(Y) depends on X when there are random slopes, check out page 30 of these slides ("Why does the variance depend on x?").
Because the ICC is a function of the predictors (the x-values), it can only be computed for particular sets of x-values. You could perhaps try something like reporting the ICC at the joint average of the x-values, but this estimate will be demonstrably inaccurate for the majority of the observations.
Everything I've said still only refers to cases where there is a single random factor. With multiple random factors it becomes even more complicated. For example, in a multi-site project where participants at each site respond to a sample of stimuli (i.e., 3 random factors: site, participant, stimulus), we could ask about many different ICCs: What is the expected correlation between two responses at the same site, to the same stimulus, from different participants? How about at different sites, the same stimulus, and different participants? And so on. @rvl mentions these complications in the answer that the OP linked to.
So as you can see, the only case where we can summarize the degree of clustering with a single value is the single-random-factor random-intercept-only case. Because this is such a small proportion of real-world cases, ICCs are not that useful most of the time. So my general recommendation is to not even worry about them. Instead I recommend just reporting the variance components (preferably in standard deviation form).