# Solved – predicted vs expected values using lmer in R

expected valuelme4-nlmemixed modelpredictionr

I am running a multilevel model with several interactions and a binary treatment. To summarize the model I would like to compute the first differences between treatment (1) and control group (0) using simulations (to get a good approximation to the confidence intervals).

Using the distinction pointed out by King et al. 2000 (Making the most of statistical analysis: improving interpretation and presentation) between predicted values (they contained both fundamental and estimation uncertainty), and expected value (averages over the fundamental variability leaving only the estimation uncertainty caused by not having an infinite number of observations. Thus, the predicted values have a larger variability than the expected values. Moreover, a first difference is the difference between two expected rather than predicted values.

I am trying to get first differences using R and lme4.

Here an example to predict and get expected values:

# create data
set.seed (1)
J <- 15
n <- J*(J+1)/2
group <- rep (1:J, 1:J)
mu.a <- 5
sigma.a <- 2
a <- rnorm (J, mu.a, sigma.a)
b <- -3
x <- rnorm (n, 2, 1)
sigma.y <- 6
y <- rnorm (n, a[group] + b*x, sigma.y)
u <- runif (J, 0, 3)
dat <- cbind (y, x, group)

# run model
model <- lmer(y ~ x + (1|group), data = dat)

I am trying to get the expected values (and confidence intervals) when x = 2. I would like to take into account the uncertainty of the random effect, but I am not sure if that is possible or appropriate, and if sampling groups is the best way to deal with the estimation of random effects.

First, I use the simulate function (I don't exactly the difference betwen simulateand bootmer):

# simulate
pred1  <- function(model, groups = 1:15, nsim = 1000) {

g <- sample(groups, nsim, replace = TRUE) # sampling groups to deal with random effects, I am not sure this is the right thing.

pv <-  simulate(mer, newdata = data.frame(x  = rep(2, nsim), group = g),
nsim = nsim, use.u = FALSE)
ev <- apply(pv, 1, mean)

return(list(pv, ev))
}

s1 <- pred1(mer, nsim = 1000)

# prediction
quantile(unlist(s1[[1]]), c(0.5, 0.025, 0.975))
50%       2.5%      97.5%
-1.335765 -15.180028  12.547462

# expected values
quantile(s1[[2]], c(0.5, 0.025, 0.975))
50%       2.5%      97.5%
-1.3377041 -1.7398980 -0.9137497

The difference between methods are big.

Using the bootmer function:

pred2 <- function(model, nsim = 1000) {

g <- sample(1:15, nsim, replace = TRUE) # again, I am not sure this is the best way to do it

mySumm <- function(.) {
predict(., newdata = data.frame(x = rep(2, nsim), group = g), re.form = NULL)
}

pv <-  lme4::bootMer(mer, mySumm, nsim = nsim, use.u = FALSE)\$t
ev <- apply(pv, 2, mean)
return(list(pv, ev))
}

s2 <- pred2(mer, nsim = 1000)

# prediction
quantile(unlist(s2[[1]]), c(0.5, 0.025, 0.975))
50%      2.5%     97.5%
-1.353249 -6.967289  4.396240

# expected values
quantile(s2[[2]], c(0.5, 0.025, 0.975))
50%      2.5%     97.5%
-1.309230 -1.483738 -1.227172

The histogram of expected values looks weird when using bootmer: I am getting the same values several times…

Any suggestions, comments, ideas?