Solved – predicted vs expected values using lmer in R

expected valuelme4-nlmemixed modelpredictionr

I am running a multilevel model with several interactions and a binary treatment. To summarize the model I would like to compute the first differences between treatment (1) and control group (0) using simulations (to get a good approximation to the confidence intervals).

Using the distinction pointed out by King et al. 2000 (Making the most of statistical analysis: improving interpretation and presentation) between predicted values (they contained both fundamental and estimation uncertainty), and expected value (averages over the fundamental variability leaving only the estimation uncertainty caused by not having an infinite number of observations. Thus, the predicted values have a larger variability than the expected values. Moreover, a first difference is the difference between two expected rather than predicted values.

I am trying to get first differences using R and lme4.

Here an example to predict and get expected values:

# create data
set.seed (1)
J <- 15
n <- J*(J+1)/2
group <- rep (1:J, 1:J)
mu.a <- 5
sigma.a <- 2
a <- rnorm (J, mu.a, sigma.a)
b <- -3
x <- rnorm (n, 2, 1)
sigma.y <- 6
y <- rnorm (n, a[group] + b*x, sigma.y)
u <- runif (J, 0, 3)
dat <- cbind (y, x, group)

# run model
model <- lmer(y ~ x + (1|group), data = dat)

I am trying to get the expected values (and confidence intervals) when x = 2. I would like to take into account the uncertainty of the random effect, but I am not sure if that is possible or appropriate, and if sampling groups is the best way to deal with the estimation of random effects.

First, I use the simulate function (I don't exactly the difference betwen simulateand bootmer):

# simulate
pred1  <- function(model, groups = 1:15, nsim = 1000) {

g <- sample(groups, nsim, replace = TRUE) # sampling groups to deal with random effects, I am not sure this is the right thing.

pv <-  simulate(mer, newdata = data.frame(x  = rep(2, nsim), group = g),  
                      nsim = nsim, use.u = FALSE)
ev <- apply(pv, 1, mean)

return(list(pv, ev))
}


s1 <- pred1(mer, nsim = 1000)

# prediction
quantile(unlist(s1[[1]]), c(0.5, 0.025, 0.975))
       50%       2.5%      97.5% 
 -1.335765 -15.180028  12.547462 

# expected values
quantile(s1[[2]], c(0.5, 0.025, 0.975))
       50%       2.5%      97.5% 
-1.3377041 -1.7398980 -0.9137497 

The difference between methods are big.

Using the bootmer function:

pred2 <- function(model, nsim = 1000) {

  g <- sample(1:15, nsim, replace = TRUE) # again, I am not sure this is the best way to do it

  mySumm <- function(.) {
    predict(., newdata = data.frame(x = rep(2, nsim), group = g), re.form = NULL)
  }

  pv <-  lme4::bootMer(mer, mySumm, nsim = nsim, use.u = FALSE)$t
  ev <- apply(pv, 2, mean)
  return(list(pv, ev))
}

s2 <- pred2(mer, nsim = 1000)

# prediction
quantile(unlist(s2[[1]]), c(0.5, 0.025, 0.975))
      50%      2.5%     97.5% 
-1.353249 -6.967289  4.396240 

# expected values
quantile(s2[[2]], c(0.5, 0.025, 0.975))
      50%      2.5%     97.5% 
-1.309230 -1.483738 -1.227172 

The histogram of expected values looks weird when using bootmer: I am getting the same values several times…

Any suggestions, comments, ideas?

Best Answer

lme4 is an excellent package that I've used before, but I don't know enough to answer your question. I can suggest an alternative that might make things more straightforward: the Bayesian package rstanarm.

Since it's Bayesian, you automatically have samples of your coefficient values that reflect multiple of the uncertainties that you are trying to bake into your simulation. Look at this rstanarm vignette, which has an example similar to yours towards the bottom: Hierarchical Partial Pooling for Repeated Binary Trials, section Posterior Predictive Distribution.

Hope this is helpful, and maybe it will open up a new perspective for you. You may need to think about things like priors, but the rstanarm package makes the transition as easy as possible.

Related Question