Solved – Interpretation of ICC when multiple random intercepts and fixed effects are collinear in mixed models

intraclass-correlationmixed modelr

The intraclass correlation coefficient (ICC) is used in mixed models to give a sense of how much variance is explained by a random effect. It is calculated by dividing the variance of the random effect by the total random variance, i.e. sum of all random effects and error. However, it is unclear to me what happens to the shared variance, which may possibly exist between multiple random intercepts or fixed effects.

Below I paste a code to illustrate my problem using R's mtcars dataset. Let miles per gallon (mpg) be the outcome and number of cylinders a random variable. Normally, this variable would be better modeled as linear fixed effect, but for this example lets model cylinders as random effect and calculate the ICC.

If cylinders is the only variable in the model, it explains 76% of the variance. If we also include horse power as random effect, which is highly correlated (0.83), the estimate drops to 23%.

Question 1: Does the ICC of 23% now represent the variance explained by cylinder independent of HP, similarly to regression coefficients in multiple regression? If this is the case, what happens to the shared variance explained by both random intercepts? It appears, that the ICC would give an inflated measure, if the shared variance is not included in the total random effect.

We can extend the model by including fixed effects, e.g. displacement, which is also highly correlated to the number of cylinders (0.90). If we do this, we observe an ICC estimate of 27% for number of cylinders.

Question 2: Does the 27% refer to the variance explained of the total variance of the outcome, or rather the variance which is left after accounting for displacement. The reason why this question occurs to me, is because in the ICC calculations the variance explained by fixed effects is not taken into account in the total random effects. The total random effects are only the random effects and error. If this is the case, it seems to me that the ICC would give an inflated proportion of variance explained, when the fixed effects are strong. If this is correct, is there an alternative to calculate the variance explained by a random effect using the total variance of the outcome, rather than the variance after taken into account fixed effects?

> # Load packages
> library(lme4) # For mixed models
> library(sjstats) # For ICC calculations
> library(car) # For VIF calculation
> 
> # All variables are highly correlated
> cor(mtcars[c("mpg","cyl","hp","disp")])
            mpg        cyl         hp       disp
mpg   1.0000000 -0.8521620 -0.7761684 -0.8475514
cyl  -0.8521620  1.0000000  0.8324475  0.9020329
hp   -0.7761684  0.8324475  1.0000000  0.7909486
disp -0.8475514  0.9020329  0.7909486  1.0000000
> # Cylinder and HP show some collinearity based on VIF of a linear model
> vif(lm(mpg ~ cyl + hp, data = mtcars))
     cyl       hp 
3.256998 3.256998 
> 
> # Random effect of Cylinder explains 76% of variance
> icc(lmer(mpg ~ (1|cyl), data = mtcars))

Linear mixed model

Family : gaussian (identity)
Formula: mpg ~ (1 | cyl)

  ICC (cyl): 0.7617
> # Random effect of horse power explains 96% of variance
> icc(lmer(mpg ~ (1|hp), data = mtcars))

Linear mixed model

Family : gaussian (identity)
Formula: mpg ~ (1 | hp)

  ICC (hp): 0.9598
> # Both random intercepts together explain 23% and 72%
> # Are these variances the ones uniquele explained by each variable?
> # If so, what happened to the shared variance?
> # Thus in total 95% variance explained
> icc(lmer(mpg ~  (1|cyl) + (1|hp), data = mtcars))

Linear mixed model

Family : gaussian (identity)
Formula: mpg ~ (1 | cyl) + (1 | hp)

   ICC (hp): 0.2343
  ICC (cyl): 0.7210
> summary(lmer(mpg ~  (1|cyl) + (1|hp), data = mtcars))
Linear mixed model fit by REML ['lmerMod']
Formula: mpg ~ (1 | cyl) + (1 | hp)
   Data: mtcars

REML criterion at convergence: 164.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.54883 -0.44043  0.01295  0.32934  1.99878 

Random effects:
 Groups   Name        Variance Std.Dev.
 hp       (Intercept) 10.853   3.294   
 cyl      (Intercept) 33.391   5.778   
 Residual              2.067   1.438   
Number of obs: 32, groups:  hp, 22; cyl, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)    19.73       3.43   5.753
> 
> # Adding displacement worsens multicollinearity 
> vif(lm(mpg ~ cyl + hp + disp, data = mtcars))
     cyl       hp     disp 
6.732984 3.350964 5.521460 
> 
> # Displacements as fixed effect explains 72% of variance
> summary(lm(mpg ~ disp, data = mtcars))$r.squared
[1] 0.7183433
> # If we adjust for fixed effect of displacement,
> # are the variance explained by the random effects of the whole variance
> # or the proportion which is not already explained by displacement (28%)
> # That is are these 27% and 50% of 28% total variance?
> icc(lmer(mpg ~  (1|cyl) + (1|hp) + disp, data = mtcars))

Linear mixed model

Family : gaussian (identity)
Formula: mpg ~ (1 | cyl) + (1 | hp) + disp

   ICC (hp): 0.2718
  ICC (cyl): 0.5069
> summary(lmer(mpg ~  (1|cyl) + (1|hp) + disp, data = mtcars))
Linear mixed model fit by REML ['lmerMod']
Formula: mpg ~ (1 | cyl) + (1 | hp) + disp
   Data: mtcars

REML criterion at convergence: 167.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.1446 -0.4842 -0.1184  0.3476  1.6547 

Random effects:
 Groups   Name        Variance Std.Dev.
 hp       (Intercept) 5.146    2.269   
 cyl      (Intercept) 9.597    3.098   
 Residual             4.190    2.047   
Number of obs: 32, groups:  hp, 22; cyl, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept) 26.13705    2.78604   9.381
disp        -0.02783    0.00931  -2.989

Correlation of Fixed Effects:
     (Intr)
disp -0.730
> 

Best Answer

I'm not sure, which package-version of sjstats you're using, but there was an update to version 0.17.0 on CRAN recently - I'm referring to this package-version on my answer.

In your example with two random effects ((1 | hp) + (1 | cyl)), the ICC is referring to the variance explained by the specific group in relation to the total variance. See the help-file (?icc), which states:

Caution: By default, for three-level-models, depending on the nested structure of the model, or for models with multiple random effects, icc() only reports the proportion of variance explained for each grouping level. Use adjusted = TRUE to calculate the adjusted and conditional ICC.

This calculation for the ICC is / was especially useful for the "simple" case of random effects. In package-version 0.17.0, the icc()-function has an adjusted-argument, which then uses the method proposed by Nakagawa et al. (2017) to calculate the ICC. Here you get the "shared" variance of the random effects, which take all sources of uncertainty (i.e. of all random effects) into account. The "adjusted" ICC only considers the random effects, while the "conditional" ICC also takes the fixed effects variances into account.

If random effects are not nested and not cross-classified, the adjusted (adjusted = TRUE) and "unadjusted" (adjusted = FALSE) ICC are identical.

So, for your case, I would suggest using icc(model, adjusted = TRUE).

Reference:

Nakagawa S, Johnson P, Schielzeth H (2017) The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisted and expanded. J. R. Soc. Interface 14. doi: 10.1098/rsif.2017.0213

Related Question