Solved – Mixed model with lmer: Variance of residuals should give the same as level 1 variance

lme4-nlmemixed modelmultilevel-analysis

I expected that the variance of residuals from a mixed model computed by, for example, lmer should give the same as the residual variance from the summary output.

set.seed(123)
# -- Data simulation
group  <- rep(letters[1:10], each=10)
groupx <- rep(rnorm(10,0,5), each=10)
x0     <- 1:10
y0     <- rep(3*x0,10)
y      <- y0+groupx + rnorm(100,0,3)
data0  <- data.frame(grp=group,x=x0,y=y)
head(data0)

# -- library(lattice)
xyplot(y ~ x|grp, data=data0)

# -- Model
f0  <- lmer(y ~ x + (1|grp), data= data0, REML=FALSE)
f0s <- summary(f0)
f0s

# -- Compare level 1 variance (Residual):
as.data.frame(VarCorr(f0))[,c("grp","vcov")]
# -- with variance of residuals:
var(resid(f0))

In my example I got:

> as.data.frame(VarCorr(f0))[,c("grp","vcov")]
       grp     vcov
1      grp 20.621959
2 Residual 7.090458

> var(resid(f0))
[1] 6.469677

I expected that var(resid(f0)) should be exactly the same as the variance of Residuals from summary. Could someone gives me a hint what is wrong in my argumentation?

UPDATE 1:
According to the formula in link

$\hat{\sigma}^2_{\alpha}=E\left[\frac{1}{10}\sum_{s=1}^{10} \alpha_s^2\right]= \frac{1}{10}\sum_{s=1}^{10}\hat{ \alpha }_s^2 +\frac{1}{10}\sum_{s=1}^{48}var(\hat{ \alpha }_s) $

I tried to compute the variance of the random variable:

> 1/10 * sum((se.ranef(f0)$grp)^2) + var(ranef(f0)$grp)
            (Intercept)
(Intercept)    22.83712

which seems quite different compared to 20.621959 in the summary above.

With method REML I got the following results:

f1  <- lmer(y ~  x + (1|grp), data= data0, REML=TRUE)
f1s <- summary(f1)
as.data.frame(VarCorr(f1))[,c("grp","vcov")]
1/10 * sum((se.ranef(f1)$grp)^2) + var(ranef(f1)$grp)

> f1  <- lmer(y ~  x + (1|grp), data= data0, REML=TRUE)
>     f1s <- summary(f1)
>     as.data.frame(VarCorr(f1))[,c("grp","vcov")]
       grp      vcov
1      grp 22.984104
2 Residual  7.170126
>     1/10 * sum((se.ranef(f1)$grp)^2) + var(ranef(f1)$grp)
            (Intercept)
(Intercept)     22.984

Obviously, the formula is correct when method REML is used.

With

var(resid(f0))*99/90

I got a SD for residuals of 7.116645 which is near to 7.09. But how to justify a degree of freedom of 90? Or does it makes no sense to speak about df in mixed models?

Best Answer

You might want to have a look at Interpretation of various output of "lmer" function in R.


Basically the estimated residual variance $\hat{\sigma}$ is an estimation for a population parameter $\sigma$. Whereas the estimated variance of your residual is an estimation based on a subset of your population. You will encounter the same thing in simple linear models:

set.seed(123)
x <- rnorm(50)
y <- 1+2*x + rnorm(50, sd = 2)
mod <- lm(y~x)
summary(mod)$sigma #1.828483
sd(resid(mod))     #1.809729
Related Question