Solved – linear regression vs linear mixed effect model coefficients

linearmixed modelregression

It is my understanding that linear regression models and linear mixed effect regression models will produce the same regression coefficients (i.e., fixed effects); however, linear regression models produce downwardly biased standard errors leading to inflated Type I error (Cohen, Cohen, Aiken, & West, 2003). Yet, I have a dataset where the linear regression and mixed model coefficients are orders of magnitude different and I do not understand why. The regressions have only one predictor and I estimate a random effect for just the intercept in the linear mixed effect regression model. Does anyone know the conditions under which the model coefficients will be discrepant?

As requested by a comment, here is my R code and output as well as the dataset attached. Notice the linear regression slope is twice the linear mixed effect model fixed slope and the intercepts have different signs!

lm1 <- lm(Y ~ X, data = d); lm1$coefficients
(Intercept)    X 
  -1.132507    1.184904 
lmer1 <- lmer(Y ~ X + (1 | ID), data = d); lmer1@beta
[1] 1.6767616 0.6376439

ID
1.00
1.00
1.00
2.00
2.00
2.00
3.00
3.00
3.00
4.00
4.00
4.00
5.00
5.00
5.00
6.00
6.00
6.00
7.00
7.00
7.00
8.00
8.00
8.00
9.00
9.00
9.00
10.00
10.00
10.00
11.00
11.00
11.00
12.00
12.00
12.00
13.00
13.00
13.00
14.00
14.00
14.00
15.00
15.00
15.00
16.00
16.00
16.00
17.00
17.00
17.00
18.00
18.00
18.00
19.00
19.00
19.00
20.00
20.00
20.00

Y
1.00
2.00
3.00
5.00
4.00
6.00
7.00
8.00
9.00
2.00
3.00
4.00
5.00
5.00
6.00
7.00
6.00
8.00
3.00
4.00
2.00
1.00
2.00
1.00
5.00
6.00
4.00
7.00
8.00
9.00
8.00
8.00
7.00
6.00
4.00
2.00
4.00
5.00
6.00
6.00
7.00
5.00
3.00
4.00
2.00
1.00
2.00
3.00
4.00
2.00
3.00
5.00
6.00
4.00
7.00
8.00
6.00
9.00
8.00
9.00

X
3.00
4.00
3.00
6.00
4.00
6.00
6.00
8.00
5.50
4.00
3.00
5.50
5.00
7.00
5.50
7.00
4.50
6.00
4.00
3.00
4.00
2.50
4.00
3.00
6.00
6.00
6.50
7.00
8.00
7.00
7.00
5.50
6.00
6.50
4.00
4.00
3.50
5.00
4.00
5.50
7.00
4.50
4.50
6.00
5.50
2.00
3.00
6.00
3.00
4.50
3.00
5.00
6.00
3.00
7.50
7.50
5.50
6.50
7.00
6.00

Best Answer

I don't know that I can give a rigorous theoretical explanation, but a picture may make things clearer:

enter image description here

  • The blue line is the OLS fit, the gray line is the population-level prediction for the mixed model. The individual lines are predicted lines (all equal slopes, randomly varying intercepts) for each ID.
  • Since there is some correlation between the mean values of X and Y for each group, some of the variability that would go into the slope is instead taken out by the random intercept term.
  • The apparently large difference in the intercepts is partly caused by extrapolation (the data starts at X=2, the intercept refers to the expected value at X=0).

d <- data.frame(ID=factor(rep(1:20,each=3)),
                Y=c(1,2,3,5,4,6,7,8,9,2,3,4,5,5,6,7,6,
                    8,3,4,2,1,2,
                    1,5,6,4,7,8,9,8,8,7,6,4,
                    2,4,5,6,6,7,5,3,4,2,1,2,
                    3,4,2,3,5,6,4,7,8,6,9,8,9),
                X=c(3,4,3,6,4,6,6,8,5.5,4,3,5.5,5,7,5.5,7,4.5,6,4,
                    3,4,2.5,4,3,6,6,6.5,7,8,7,7,5.5,6,6.5,4,4,3.5,
                    5,4,5.5,7,4.5,4.5,6,5.5,2,3,6,3,4.5,3,5,6,3,
                    7.5,7.5,5.5,6.5,7,6))

lm1 <- lm(Y ~ X, data = d)
library(lme4)
lmer1 <- lmer(Y ~ X + (1 | ID), data = d)
ff <- fixef(lmer1)
## get predictions
pp <- d
pp$Y <- predict(lmer1)
library(dplyr)
pp <- pp %>%
    group_by(ID) %>%
    filter(Y %in% range(Y))

library(ggplot2); theme_set(theme_bw())
ggplot(d,aes(X,Y,colour=ID))+
    geom_point()+
    scale_colour_discrete(guide=FALSE)+
    geom_line(data=pp)+
    scale_x_continuous(limits=c(0,8))+
    geom_smooth(method="lm",aes(group=1),fullrange=TRUE)+
    geom_abline(slope=ff["X"],intercept=ff["(Intercept)"],
                colour="darkgray",lwd=1.5)
ggsave("CV161703.png")
Related Question