Solved – Backing out fixed / random effects in lmer mixed model

lme4-nlmer

Assume that I have a linear mixed model of the following form, specified using lme4:

fit <- lmer(A ~ B + C + (1|D) + (1|E), data=data)

I am struggling with the best way to isolate the net impact of the fixed effects (B or C) on the value of D in the overall model, as well as the impact of random effect E on D.

To isolate the net effect of random effect D, normally I would do something like this, to predict outputs both with and without D:

data$w_D <- predict(fit, data, type='response')
data$wo_D <- predict(fit, data, re.form=~ (1|E), type='response')

Let's call this a "with and without" D analysis. I would then use something like the aggregate function to compile the sum difference of D in the predicted columns of "w_D" and "wo_D", subtract the sums of of those two columns from each other, and deposit the answer for all subjects in D to a new vector. Let's call the new vector D_net, and it contains the net value of D, which is what I care about. There may be other ways to do this, but this is the approach that makes sense to me.

But this gets a little interesting if one wants to isolate the impact of a random effect AND start isolating the impact of another variable, whether it be a fixed effect or a random effect.

Let's start with B, a fixed effect. One option could be to convert all observations of B to their mean, otherwise predict the same "with and without" D sequence above, and subtract the values of this new D_net from the original D_net. My concern is that effectively removing a fixed effect like this could meaningfully affect the residual, meaning that the original random effect assignments might no longer be valid.

To ensure the random effects are recalculated, we could also re-specify the model by dropping B (let's call this fit2), execute the model otherwise the same, do a "with and without" sequence for D in this updated equation, and again subtract these D_net values from the original D_net values. This would ensure a recalculation of the random effects, but now we've again affected the residual (by removing B). This also feels a bit backwards because now B has been removed entirely, and the remaining values, instead of being held constant to reflect the value of B, may themselves change as a result of the modified residual.

Perhaps a third option would still involve a second model for comparison, but to instead add an interaction between D and B. So we end up with something like this:

fit3 <- lmer(A ~ B + C + (1|D) + (1|E) + (1+B:D), data=data)  

The net difference of the "with and without" predictions of fit3 could again be calculated and then subtracted from the original model's net D value. Or, perhaps, with the interaction already accounting for the interaction of B:D, there may be no need to subtract from the original D_net at all. There are two further issues with this approach, however: if one of the fixed effects is itself an interaction between two fixed effects, it's unclear how that interaction can itself be interacted with a random effect. (1|B:C:D)? And of course, for large data sets, interactions with random effects can be computationally challenging, and sometimes require impractical amounts of memory.

If I wanted to deal with the effect of random effect E, instead of a fixed effect like B, I would be inclined to consider the same options.

Are any these options for investigating the effect of B (or E) on D actually satisfactory for the objective I've stated? If not, is there still another alternative? This is an aspect of mixed model interpretation I have yet to see addressed in any publications.

Thanks.

Best Answer

This is a great question. I'm not sure this answer will be satisfactory, but here is the way I tend to think about this issue.

The easiest way to make these comparisons is by contrasting predictions based on different models or different inputs -- as you suggest. Unfortunately, that's not always terribly easy to do in software. I recently co-authored an R package, merTools, in part responding to how difficult I kept finding it to produce an answer like you describe above. The package allows you to more easily calculate prediction intervals for lmer and glmer objects, as well as to explore the impact of modifying variables -- both fixed effects and random -- and seeing how they modify the predictions from the model. A simple example is below - taken from my answer to this question on SO: https://stackoverflow.com/questions/15780230/simulating-an-interaction-effect-in-a-lmer-model-in-r/31992892#31992892

The merTools package has some functionality to make this easier, though it only applies to working with lmer and glmer objects. Here's how you might do it:

library(merTools)
# fit an interaction model
m1 <- lmer(y ~ studage * service + (1|d) + (1|s), data = InstEval)
# select an average observation from the model frame
examp <- draw(m1, "average")
# create a modified data.frame by changing one value
simCase <- wiggle(examp, var = "service", values = c(0, 1))
# modify again for the studage variable
simCase <- wiggle(simCase, var = "studage", values = c(2, 4, 6, 8))

After this, we have our simulated data which looks like:

simCase
     y     studage service   d   s
1 3.205745       2       0 761 564
2 3.205745       2       1 761 564
3 3.205745       4       0 761 564
4 3.205745       4       1 761 564
5 3.205745       6       0 761 564
6 3.205745       6       1 761 564
7 3.205745       8       0 761 564
8 3.205745       8       1 761 564

Next, we need to generate prediction intervals, which we can do with merTools::predictInterval (or without intervals you could use lme4::predict)

preds <- predictInterval(m1, level = 0.9, newdata = simCase)

Now we get a preds object, which is a 3 column data.frame:

preds
       fit       lwr      upr
1 3.312390 1.2948130 5.251558
2 3.263301 1.1996693 5.362962
3 3.412936 1.3096006 5.244776
4 3.027135 1.1138965 4.972449
5 3.263416 0.6324732 5.257844
6 3.370330 0.9802323 5.073362
7 3.410260 1.3721760 5.280458
8 2.947482 1.3958538 5.136692

We can then put it all together to plot:

library(ggplot2)
plotdf <- cbind(simCase, preds)
ggplot(plotdf, aes(x = service, y = fit, ymin = lwr, ymax = upr)) + 
  geom_pointrange() + facet_wrap(~studage) + theme_bw()

A prediction plot from merTools

Unfortunately the data here results in a rather uninteresting, but easy to interpret plot.