Solved – Interpretting Standard Deviation of Random Effects in Sequential Mixed Effects Models

mixed modelrstandard deviation

I am trying to ensure that my understanding of the random effects in Mixed Effects Models is correct, so I would like to share some R code and the standard deviations in the estimate of the random effect in sequential generalized logistic mixed effects regression models as well as my interpretation to double check with the Cross Validated community.

My understanding of fixed vs. random effects themselves is:

  1. Fixed Effect – Measured effects for which intercepts of the observations will be estimated.

  2. Random Effect – Considered unobserved and normally distributed random variables rather than unknown fixed parameters.

I will make a series of mixed effects models, each with an additional variable, as follows:

library("lme4")
library("titanic")

mod1 <- glmer(Survived ~ (1 | Embarked), 
    data = titanic_train, 
    family = binomial, 
    control = glmerControl(optimizer = "bobyqa"), 
    nAGQ = 1)

mod2 <- glmer(Survived ~ Pclass + (1 | Embarked), 
    data = titanic_train, 
    family = binomial, 
    control = glmerControl(optimizer = "bobyqa"), 
    nAGQ = 1)

mod3 <- glmer(Survived ~ Pclass + Sex + (1 | Embarked), 
    data = titanic_train, 
    family = binomial, 
    control = glmerControl(optimizer = "bobyqa"), 
    nAGQ = 1)

Here "Survived" is the outcome of interest, and I have three models:

  1. "Embarked" as random effect
  2. "Embarked" as random effect, Pclass as fixed effect
  3. "Embarked" as random effect, Pclass and Sex as fixed effects

Now, if I check the standard deviations of the mixed effect (Embarked), I see that they decrease with each additional variable added:

> summary(mod1)$varcor
 Groups   Name        Std.Dev.
 Embarked (Intercept) 0.37618 
> summary(mod2)$varcor
 Groups   Name        Std.Dev.
 Embarked (Intercept) 0.30105 
> summary(mod3)$varcor
 Groups   Name        Std.Dev.
 Embarked (Intercept) 0.19804 

Would it be correct to say that as the standard deviation decreases, it is implied that the covariates being added to the model are more sufficiently explaining the variation in the outcome as compared to the random effects’ estimates?

Or stated differently, the mixed effects begin to appear more "similar" as subsequent covariates are added because the covariates being added explain more of the variation in outcome than do the mixed effects' estimates? The opposite interpretation being that if the standard deviation increased the covariates being added would explain the variation in the outcome less.

If someone could answer these questions, especially with the help of formal logic, I would really appreciate it.

Best Answer

The issue is complicated because your model is logistic. Under normal circumstances such as in a linear regression, most things you say would apply. Focusing on the linear model, I say most because adding variables should not increase random intercept variance even if the variables are mediocre predictors. The random intercept variance can go up very slightly but it shouldn't be by much. But with logistic regression, the case is not necessarily so.

I'll make some claims then explain why at the end.

If you add variables that explain the outcome better to a multilevel logistic regression, the variance of the random intercept will increase. However, if that variable also accounts for the differences between the embarks, then the random intercept variance may decrease.

If that variable in no way accounts for any differences between embarks but explains the outcome better, the variance of the random intercept will definitely go up. An example is a variable that you have centered using the mean of each embark on that variable such that it doesn't vary across embarks.

This is because the error variance is fixed to $\pi^2/3$, such that any improvements to the model will reflect in increased random intercept variance, unless such improvements simultaneously explain differences between embarks thus reducing the random intercept variance. I hope this makes some sense.

Replying to your comments about the ICC. Since you are using R, check out the MuMIn package which has an R-squared glmm function. This should allow you to calculate $R^2$ as defined by Nakagawa and Schielzeth http://dx.doi.org/10.1111/j.2041-210x.2012.00261.x Theirs is a relatively simple approach that takes the different sources of variance into consideration (fixed effects, random effects, logistic error) so that one can compare across models with varying fixed effects.

Related Question