Assume that I have a linear mixed model of the following form, specified using lme4:
fit <- lmer(A ~ B + C + (1|D) + (1|E), data=data)
I am struggling with the best way to isolate the net impact of the fixed effects (B or C) on the value of D in the overall model, as well as the impact of random effect E on D.
To isolate the net effect of random effect D, normally I would do something like this, to predict outputs both with and without D:
data$w_D <- predict(fit, data, type='response')
data$wo_D <- predict(fit, data, re.form=~ (1|E), type='response')
Let's call this a "with and without" D analysis. I would then use something like the aggregate function to compile the sum difference of D in the predicted columns of "w_D" and "wo_D", subtract the sums of of those two columns from each other, and deposit the answer for all subjects in D to a new vector. Let's call the new vector D_net, and it contains the net value of D, which is what I care about. There may be other ways to do this, but this is the approach that makes sense to me.
But this gets a little interesting if one wants to isolate the impact of a random effect AND start isolating the impact of another variable, whether it be a fixed effect or a random effect.
Let's start with B, a fixed effect. One option could be to convert all observations of B to their mean, otherwise predict the same "with and without" D sequence above, and subtract the values of this new D_net from the original D_net. My concern is that effectively removing a fixed effect like this could meaningfully affect the residual, meaning that the original random effect assignments might no longer be valid.
To ensure the random effects are recalculated, we could also re-specify the model by dropping B (let's call this fit2), execute the model otherwise the same, do a "with and without" sequence for D in this updated equation, and again subtract these D_net values from the original D_net values. This would ensure a recalculation of the random effects, but now we've again affected the residual (by removing B). This also feels a bit backwards because now B has been removed entirely, and the remaining values, instead of being held constant to reflect the value of B, may themselves change as a result of the modified residual.
Perhaps a third option would still involve a second model for comparison, but to instead add an interaction between D and B. So we end up with something like this:
fit3 <- lmer(A ~ B + C + (1|D) + (1|E) + (1+B:D), data=data)
The net difference of the "with and without" predictions of fit3 could again be calculated and then subtracted from the original model's net D value. Or, perhaps, with the interaction already accounting for the interaction of B:D, there may be no need to subtract from the original D_net at all. There are two further issues with this approach, however: if one of the fixed effects is itself an interaction between two fixed effects, it's unclear how that interaction can itself be interacted with a random effect. (1|B:C:D)? And of course, for large data sets, interactions with random effects can be computationally challenging, and sometimes require impractical amounts of memory.
If I wanted to deal with the effect of random effect E, instead of a fixed effect like B, I would be inclined to consider the same options.
Are any these options for investigating the effect of B (or E) on D actually satisfactory for the objective I've stated? If not, is there still another alternative? This is an aspect of mixed model interpretation I have yet to see addressed in any publications.
Thanks.
Best Answer
This is a great question. I'm not sure this answer will be satisfactory, but here is the way I tend to think about this issue.
The easiest way to make these comparisons is by contrasting predictions based on different models or different inputs -- as you suggest. Unfortunately, that's not always terribly easy to do in software. I recently co-authored an R package,
merTools
, in part responding to how difficult I kept finding it to produce an answer like you describe above. The package allows you to more easily calculate prediction intervals forlmer
andglmer
objects, as well as to explore the impact of modifying variables -- both fixed effects and random -- and seeing how they modify the predictions from the model. A simple example is below - taken from my answer to this question on SO: https://stackoverflow.com/questions/15780230/simulating-an-interaction-effect-in-a-lmer-model-in-r/31992892#31992892The
merTools
package has some functionality to make this easier, though it only applies to working withlmer
andglmer
objects. Here's how you might do it:After this, we have our simulated data which looks like:
Next, we need to generate prediction intervals, which we can do with
merTools::predictInterval
(or without intervals you could uselme4::predict
)Now we get a preds object, which is a 3 column data.frame:
We can then put it all together to plot:
Unfortunately the data here results in a rather uninteresting, but easy to interpret plot.