Regression – Do Beta Estimates Always Match Between lm and lmer in R?

lme4-nlmemixed modelrregression

Can anyone tell me under what conditions the beta estimates differ between lm and lmer with a random intercept? I came across a situation where the fixed effect differed considerably. I thought the std errors should change but the fixed effects should remain unchanged. The difference does not seem to be due to having different cluster sizes or having a large number of clusters with a single observation.

I cannot supply the data but have concocted a simplified example below. In this case the correlation within clusters should be negligible.

library(lme4)
x=c(rep(0,10),rep(1,10))
y=rnorm(length(x),mean=3100,sd=400)-200*x
m=c(1,2,3,4,4,6,7,8,8,10,11,12,13,14,15,16,17,18,20,20)
summary(lm(y~x))
summary(lmer(y~x+(1|m)))

Best Answer

The results of a linear model and a linear mixed-model can differ if the design is unbalanced, i.e., the number of observations per cell is different.

First, consider a balanced design:

df <- data.frame(x = rep(0:1, each = 10), y = 1:20, m = rep(1:10, each = 2))

lm(y ~ x, df)
# Coefficients:
# (Intercept)            x  
#         5.5         10.0  

library(lme4)
lmer(y ~ x + (1 | m), df)
# Fixed effects:
#             Estimate Std. Error t value
# (Intercept)    5.500      1.414   3.889
# x             10.000      2.000   5.000

The regression coefficients do not differ between both models.

Now, consider an unbalanced design: (The number of observations per subject differs.)

df2 <- data.frame(x = rep(0:1, each = 10), y = 1:20, m = rep.int(1:10, times = rep(c(1, 3), times = 5)))

lm(y ~ x, df2)
# Coefficients:
# (Intercept)            x  
#         5.5         10.0  

library(lme4)
lmer(y ~ x + (1 | m), df2)
# Fixed effects:
#             Estimate Std. Error t value
# (Intercept)    8.752      1.643   5.325
# x              2.699      1.164   2.318

The result of the simple linear model is the same as in the first example, but the result of lmer changed.

Related Solutions

Splines in R – How to Address Scale Issues Using lme4::glmer

This is on the line between a statistical and a computational question, but I'll take a shot at it.

tl;dr the bottom line is that if you have achieved similar results with different optimizers, then you can trust that the numerics of your fit are OK, and you don't need to worry about these warnings. But you should check for overdispersion.

(agreeing with @IWS) log-transforming your predictors is not necessarily required - this would be done for scientific reasons (e.g. if you had a priori reason to believe that the underlying relationship was a power-law, $y=a x^b$, then you would want to fit a linear relationship $\log(y) = \log(a) + b \log(x)$; if you thought an exponential relationship $y=\exp(a+bx)$ was more reasonable, then you wouldn't log-transform), or (phenomenologically) to improve the linearity of the relationship. If your predictors as well as your response are counts, though, it does seem reasonable to log-transform.
if you want to try centering the ageXX parameters on the log scale (which might help a bit for interpretation, although since you have no interactions in your model it will only affect the intercept parameter), you can use log(ageXX/mean(ageX)). I agree that scaling after logging makes little sense.
your spline model seems fine. The major advantage of gamm4 is that you wouldn't have to pick the number of knots yourself. It might not be that hard to switch, but I don't think it's necessary. If your data are non-uniformly distributed along re_month you might want to try specifying just the number of knots and let ns() pick the knot locations itself.
you should really check for overdispersion, and (if necessary) use either (1) an observation-level random effect (2) lme4::glmer.nb or (3) glmmTMB

The "large eigenvalue" error is triggered if the ratio of the largest to the smallest eigenvalue of the model Hessian (all eigenvalues should be positive) is too large. If you want to investigate further which model components are causing your problem, you can try something like this example:

 library(lme4)
 gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
               data = cbpp, family = binomial)
 evd <- eigen(gm1@optinfo$derivs$Hessian,symmetric=TRUE)
 ## identify params; this works with a single scalar random effect,
 ## would be more complicated with fancier RE model
 cnames <- c("theta",names(fixef(gm1)))
 dimnames(evd$vectors) <- list(cnames,cnames)
 print(evd,digits=3)
 ## eigen() decomposition
 ## $values
 ## [1] 68.01 51.20 28.51 17.95  9.56

 ## $vectors
 ##               theta (Intercept) period2 period3 period4
 ## theta       0.93012      0.3581  0.0278  0.0513 -0.0567
 ## (Intercept) 0.36178     -0.8559 -0.1011 -0.2506  0.2520
 ## period2     0.04846     -0.3050  0.7562  0.4692 -0.3356
 ## period3     0.03953     -0.2022 -0.6434  0.6410 -0.3643
 ## period4     0.00932     -0.0723 -0.0560 -0.5510 -0.8294

Here you would look at the first eigenvector and hope that something jumped out at you.

Solved – Stating the same mixed random intercept and slope model in lme as stated in lmer, and random intercept/slope equations in lmer

For your first question, Q1, did you try comparing the output for:

fit.lmer <- lmer(log_age_1 ~ log_recruits + OW_P2 +  
                (1 + log_recruits + OW_P2 |Bank2),
                 data = sub)

and

fit.lme <- lme(log_age_1 ~ log_recruits + OW_P2, 
               random = ~ 1 + log_recruits + OW_P2|Bank2,
               data=sub)

All you need to do for fit.lme is to specify that:

1) The slopes quantifying the effect of log_recruits on log_age_1
(controlling for the effect of OW_P2) are different for different levels
of the grouping factor Bank2;

2) The slopes quantifying the effect of OW_P2 on log_age_1 (controlling for the effect of log_recruits) are different for different levels
of the grouping factor Bank2.

You only have one grouping factor, Bank2, so there is no need to complicate the syntax specification for your lme model the way you would if you had two crossed grouping factors (e.g., Bank2 and Region).

Best Answer

Related Solutions

Splines in R – How to Address Scale Issues Using lme4::glmer

Solved – Stating the same mixed random intercept and slope model in lme as stated in lmer, and random intercept/slope equations in lmer

Related Question