I am running a multilevel model with several interactions and a binary treatment. To summarize the model I would like to compute the first differences between treatment (1) and control group (0) using simulations (to get a good approximation to the confidence intervals).

Using the distinction pointed out by King et al. 2000 (Making the most of statistical analysis: improving interpretation and presentation) between predicted values (they contained both fundamental and estimation uncertainty), and expected value (averages over the fundamental variability leaving only the estimation uncertainty caused by not having an infinite number of observations. Thus, the predicted values have a larger variability than the expected values. Moreover, a first difference is the difference between two expected rather than predicted values.

I am trying to get first differences using `R`

and `lme4`

.

Here an example to predict and get expected values:

```
# create data
set.seed (1)
J <- 15
n <- J*(J+1)/2
group <- rep (1:J, 1:J)
mu.a <- 5
sigma.a <- 2
a <- rnorm (J, mu.a, sigma.a)
b <- -3
x <- rnorm (n, 2, 1)
sigma.y <- 6
y <- rnorm (n, a[group] + b*x, sigma.y)
u <- runif (J, 0, 3)
dat <- cbind (y, x, group)
# run model
model <- lmer(y ~ x + (1|group), data = dat)
```

I am trying to get the expected values (and confidence intervals) when x = 2. I would like to take into account the uncertainty of the random effect, but I am not sure if that is possible or appropriate, and if sampling groups is the best way to deal with the estimation of random effects.

First, I use the `simulate`

function (I don't exactly the difference betwen `simulate`

and `bootmer`

):

```
# simulate
pred1 <- function(model, groups = 1:15, nsim = 1000) {
g <- sample(groups, nsim, replace = TRUE) # sampling groups to deal with random effects, I am not sure this is the right thing.
pv <- simulate(mer, newdata = data.frame(x = rep(2, nsim), group = g),
nsim = nsim, use.u = FALSE)
ev <- apply(pv, 1, mean)
return(list(pv, ev))
}
s1 <- pred1(mer, nsim = 1000)
# prediction
quantile(unlist(s1[[1]]), c(0.5, 0.025, 0.975))
50% 2.5% 97.5%
-1.335765 -15.180028 12.547462
# expected values
quantile(s1[[2]], c(0.5, 0.025, 0.975))
50% 2.5% 97.5%
-1.3377041 -1.7398980 -0.9137497
```

The difference between methods are big.

Using the `bootmer`

function:

```
pred2 <- function(model, nsim = 1000) {
g <- sample(1:15, nsim, replace = TRUE) # again, I am not sure this is the best way to do it
mySumm <- function(.) {
predict(., newdata = data.frame(x = rep(2, nsim), group = g), re.form = NULL)
}
pv <- lme4::bootMer(mer, mySumm, nsim = nsim, use.u = FALSE)$t
ev <- apply(pv, 2, mean)
return(list(pv, ev))
}
s2 <- pred2(mer, nsim = 1000)
# prediction
quantile(unlist(s2[[1]]), c(0.5, 0.025, 0.975))
50% 2.5% 97.5%
-1.353249 -6.967289 4.396240
# expected values
quantile(s2[[2]], c(0.5, 0.025, 0.975))
50% 2.5% 97.5%
-1.309230 -1.483738 -1.227172
```

The histogram of expected values looks weird when using `bootmer`

: I am getting the same values several times…

Any suggestions, comments, ideas?

## Best Answer

`lme4`

is an excellent package that I've used before, but I don't know enough to answer your question. I can suggest an alternative that might make things more straightforward: the Bayesian package`rstanarm`

.Since it's Bayesian, you automatically have samples of your coefficient values that reflect multiple of the uncertainties that you are trying to bake into your simulation. Look at this

`rstanarm`

vignette, which has an example similar to yours towards the bottom: Hierarchical Partial Pooling for Repeated Binary Trials, sectionPosterior Predictive Distribution.Hope this is helpful, and maybe it will open up a new perspective for you. You may need to think about things like priors, but the

`rstanarm`

package makes the transition as easy as possible.