Solved – Interpreting random effects in mixed effects models

lme4-nlmemixed modelpanel data

In interpreting the random effects from a mixed effects model, are they interpreted as been on the same scale as the outcome variable? I have noticed that when I change the scale of my outcome variable that the values of the random effects also change.

For instance, in using the sleepstudy example data I can construct a LMM using the raw scores of Reaction and one in which Reaction has been transformed into a z-score.

sleepstudy$zReaction <- scale(sleepstudy$Reaction, center = TRUE, scale = TRUE) #z scores

fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy); summary(fm1)
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + (Days | Subject)
   Data: sleepstudy

REML criterion at convergence: 1743.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9536 -0.4634  0.0231  0.4634  5.1793 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 612.09   24.740       
          Days         35.07    5.922   0.07
 Residual             654.94   25.592       
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.405      6.825   36.84
Days          10.467      1.546    6.77

Correlation of Fixed Effects:
     (Intr)
Days -0.138

fm2 <- lmer(zReaction ~ Days + (Days|Subject), sleepstudy); summary(fm2)
Linear mixed model fit by REML ['lmerMod']
Formula: zReaction ~ Days + (Days | Subject)
   Data: sleepstudy

REML criterion at convergence: 308.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9536 -0.4634  0.0231  0.4634  5.1793 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 0.19291  0.4392       
          Days        0.01105  0.1051   0.07
 Residual             0.20642  0.4543       
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept) -0.83621    0.12116  -6.902
Days         0.18582    0.02744   6.771

Correlation of Fixed Effects:
     (Intr)
Days -0.138

As we can see, for the model using the Raw scores the variance associated with Days is 35.07, but for the Z scores it is 0.011. So then, 35.07 (or 0.011) is the amount of variability in the slope across subjects, does this mean on average an individuals true rate of change differs from the population mean by 35.07?

Thanks

**EDIT

I was aware that un-scaling the output from fm2 would return the same results as fm1, which @l'ombradel'atzavara very nicely demonstrated. One of the reasons to look at the random effects in a model is determine if there is any additional variability that could be explained by the inclusion of additional predictors. If the random effect is '0' there is very little variation to explain and as such no reason to include the additional predictors. As such if we look at the random effects from the unscaled (35.07) and scaled data (0.01), we see that our interpretation of the results can change. We can use a hypothesis test to determine if a our random effect is significantly different from 0, but this still raises the question how different from 0 is the absolute value from 0, ie 35 is much greater then 0.01.

Best Answer

There are several aspects to your question, and I am not sure I truly understand it fully. With this big caveat in mind, let me try and shed some light into some of the issues that seem to be a concern in your question.

First off, the difference between scaled and not scaled data is self-evident if you look into what's happening within the function, following its conceptual (albeit sometimes not computational) built-in steps. Let's first center and scale (in other words scale(..., scale=TRUE, center=TRUE)) the data "manually", which simply entails subtracting the mean of Reaction from every reaction time in the data cloud and then divide by the standard deviation of Reaction:

    SD <- sd(sleepstudy$Reaction)
    M <- mean(sleepstudy$Reaction)
    scaled <- scale(sleepstudy$Reaction, center = TRUE, scale = TRUE)

And compare to your scaled data using the R function: identical(sleepstudy$zReaction, scaled) [1] TRUE.

So presumably, we can recover the intercepts of fm1 if we just un-scale the results - i.e. we multiply the coefficients by the standard deviation, and then add the mean:

all.equal(coef(fm2)$Subject[,1]*SD + M, coef(fm1)$Subject[,1]) [1] TRUE

I hope this clarifies the issue with the scaling of the data.

As for the second part regarding the 35.07 variance in the slopes across subjects, we are trying to sort out the spread in the slope values across individuals, which we assume are normally distributed. This is difficult to reconcile with the end of your question, but I hope it helps.

Related Solutions

Solved – Probabilities of odds ratios in random intercept models

Finally... @BenBoker was right with predict and plogis. What I am exactly looking for is the predicted values for model terms (i.e. plogis(predict(fit, type = "terms")), however, I'm not sure how to get predicted values for model terms from merMod objects. predict.merMod has no type = "terms" option.

Solved – Is it reasonable to include a random slope term in an lmer model without the corresponding fixed effect

I believe this question to be very similar to the often wondered "must one always include an intercept term in a linear regression", for which the agreed upon answer is "yes, unless you have an extremely good reason not to".

I tried to think through what would happen without the fixed effect term before running any experiment. Let's write your two models out in detail. The first, with the fixed effect slope, is

$$ y \sim N(\mu_{\alpha} +\alpha_{[i]} + (\mu_{\beta} + \beta_{[i]}) x, \sigma) $$ $$ \alpha \sim N(0, \sigma_{\alpha}) $$ $$ \beta \sim N(0, \sigma_{\beta}) $$

where $x$ is the number of days, and we have a random intercept $\alpha_{[i]}$, and a random slope $\beta_{[i]}$ for each subject. In the other case, where there is no fixed slope, the model is

$$ y \sim N(\mu_{\alpha} + \alpha_{[i]} + \beta_{[i]} x, \sigma) $$ $$ \alpha \sim N(0, \sigma_{\alpha}) $$ $$ \beta \sim N(0, \sigma_{\beta}) $$

The difference is that in the second model, we a priori assume that the mean of the random slopes is zero. This means, we expect the slopes associated to the various subjects to distribute evenly around a slope of $0$ (for example, half should be negative and half positive).

Now, in the model on your data this does not seem to be true. In your second plot the estimated slopes within each subject are all positive. It looks like this model is invalid for your data. The inclusion of the fixed slope includes the mean of the subject-wise slopes as a degree of freedom, and in this plot you see the random slopes cluster evenly around zero, as you would like.

As for inference from the parameters in your model, I believe this misstatement of the model will cause the following parameter estimates to be bias

The subject-wise slopes will be biased towards zero, because the assumption of mean zero in the likelihood will pull them towards zero.
The estimated standard deviation of the random slopes will be too large, because inflating this parameter lets the slopes cluster around their true, non-zero mean without being penalized so severely.

Here I'll create some simulated data where the true subject-wise mean slope is non-zero

library("lme4")
library("arm")
set.seed(154)

N_classes = 50
N_obs <- 10000

random_intercepts <- structure(
  rnorm(N_classes), names = as.character(1:N_classes)
)

random_slopes <- structure(
  rnorm(N_classes, mean = 1), names = as.character(1:N_classes)
)

classes <- sample(as.character(1:N_classes), size = N_obs, replace = TRUE)
x <- runif(N_obs)
y <- random_intercepts[classes] + random_slopes[classes] * x + rnorm(N_obs)

df <- data.frame(class = factor(classes), x = x, y = y)

The first model estiamtes all true parameters well

> M <- lmer(y ~ x + (x | class), data = df)
> display(M)
lmer(formula = y ~ x + (x | class), data = df)
        coef.est coef.se
(Intercept) 0.01     0.15   
x           1.02     0.15   

Error terms:
 Groups   Name        Std.Dev. Corr 
 class    (Intercept) 1.03          
          x           1.01     0.19 
 Residual             1.00

Look's like here all the parameters are estimated well, including the standard deviation of the random slopes.

Here's the model without the fixed slope

> N <- lmer(y ~ (x | class), data = df)
> display(N)
lmer(formula = y ~ (x | class), data = df)
coef.est  coef.se 
   -0.14     0.15 

Error terms:
 Groups   Name        Std.Dev. Corr 
 class    (Intercept) 1.04          
          x           1.43     0.24 
 Residual             1.00

The estimate of the random slope standard deviation is 1.43, confirming my intuition that it would be biased to be too large.

The mean of the subject-wise slopes in the model M comes out well

> mean(fixef(M)["x"] + ranef(M)$class$x)
[1] 1.015418

It doesn't seem like my intuition was quite correct on the other model

> mean(ranef(N)$class$x)
[1] 0.9858566

It looks like the model took fitting the data a bit more seriously than making sure the normality of random slope assumption was totally met. Altogether, it looks like the inflation of the random slope standard deviation is the most serious issue.

Best Answer

Related Solutions

Solved – Probabilities of odds ratios in random intercept models

Solved – Is it reasonable to include a random slope term in an lmer model without the corresponding fixed effect

Related Question