ICC in Multilevel Modelling – Use of Intraclass Correlation

intraclass-correlationmixed modelmultilevel-analysisrandom-effects-modelregression

Ive just recently started learning about the ICC and multilevel models and I've been told that one way to determine whether a MLM is warranted is by checking the size of the ICC. I'm struggling to understand why the ICC is a good indicator for whether you should run a MLM

From what I understand, the ICC tells you how much variability there is between your clusters/groups. If ICC is large, then there is a lot of variability between your clusters, and you should treat them separately, either allowing for random intercepts or random slopes

I can sort of understand why the ICC might be useful if you were interested in running a random intercepts model. High variability between clusters might suggest that they have differing means and potentially differing intercepts, so we run a MLM that allows for a different intercept for each cluster

But does the ICC tell you anything about the likelihood of different slopes for each cluster? I cant quite wrap my head around how slopes are related to the value returned by the ICC. If the ICC the small, then it means theres little variability between the clusters, which might suggest that their means are similar and thus a random intercepts model may not be needed, but does that automatically mean a random slopes model is also not warranted?

Best Answer

The ICC (intra-class correlation) is interpretable and useful for random intercepts models. It is the correlation between two observations within the same cluster. The higher the correlation within the clusters (ie. the larger the ICC) the lower the variability is within the clusters and consequently the higher the variability is between the clusters.

Alternatively, it is also measure of how much variation there is at each level, and this is why it is also called the variance partition coefficient (VPC).

Therefore, as you rightly point out, in a random intercepts model, when the ICC is large, this is evidence in favour of retaining the random intercepts, while when it is small, this is evidence in favour of discarding random intercepts. However, as is often the case in applied statistics, what determines "small" and "large" is context-specific and discipline-specific.

Once we introduce random slopes/coefficients, things get more complicated. The ICC is no longer the same as the VPC, because the ICC will be a function of the variable(s) for which random slopes are specified. Therefore there can be an infinite number of values for the ICC is the variable in question is continuous, and as many as the number of levels if it is categorical or a count. Thus any interpretation of the ICC in a random slopes model becomes more difficult. Stata, for example, will calculate a single value for the ICC but in a random slopes model, this is accompanied by the warning:

Note: ICC is conditional on zero values of random-effects covariates.

In other words, it has computed the ICC based on a value of zero for the random slope variable(s), so any interpretation of the ICC is also based on a value of zero for the slope variable(s).

Regarding your question:

If the ICC the small, then it means theres little variability between the clusters, which might suggest that their means are similar and thus a random intercepts model may not be needed, but does that automatically mean a random slopes model is also not warranted?

No, because it is possible for each cluster to have the same intercept (no random intercept) while the slopes may indeed vary, which we can visualise like this:

enter image description here

If we want to know whether random slopes are supported by the data, one approach is to fit models with and without random slopes and use a likelihood ratio test.

Related Question