Mixed Model – Getting Fixed-Effect Only Predictions on New Data in R

mixed modelr

I would like to construct predictions for a mixed model (logistic via glmer) on a new data set using only the fixed effects, holding the random effects to 0. But I am having trouble setting up the model matrix to be able to calculate them.

Since the mer class doesn't have a predict method, and since I want to omit the random effects for predictions on the new data set, I think I need to construct a model matrix for the fixed effects of the same structure used in the original model, but using the new data. Then multiply by the fixed effect coefficients in the model.

The fixed effect portion of my model formula contains factors and interaction terms between numeric fixed effects, so it's a little more complicated than just extracting the fixed variables from the matrix. e.g. I need to ensure the factor contrast expansion is the same as the original, interaction terms are properly listed, etc.

So my question is: what is the more straightforward general approach for constructing a new model matrix that mimics the structure of the original model matrix used in creating the model?

I've tried model.matrix(my.model, data=newdata) but that seems to return the original model matrix, not one based on newdata.

Sample code:

library(lme4)

cake2 <- head(cake) # cake2 is "new" data frame for future predictions

# recipe is a fixed effect factor, temp is fixed effect numeric, replicate is random effect
m <- lmer(angle ~ temp + recipe + (1 | replicate), data=cake)
summary(m)

nrow(cake2)         # but new data frame has 6 rows
nrow(cake)          # original data frame has 270 rows

# attempt to make new model matrix using different data frame
mod.mat.cake2 <- model.matrix(m, data=cake2)
nrow(mod.mat.cake2) # 270 rows, same as orig data frame

I tried other methods like extracting the terms from the formula and building a new formula from that, but it seemed overly convoluted, and brittle in handling factors and interaction terms.

How can I get mod.mat.cake2 to be a fixed effect model matrix based on the formula in m, but using values from cake2? Or is there an easier way to go about getting fixed-effect only predictions from an lmer model?

All help is appreciated. Thank you.

Best Answer

Maybe this is cheating, but

fixedformula <- as.formula(lme4.0:::nobars(formula(m))[-2])
model.matrix(fixedformula,newdata=cake2)

note:

I am using lme4.0 here, which is the r-forge version of "old" (CRAN) lme4: you can substitute lme4 for lme4.0 in the code above
the new (r-forge/development) version of lme4 has a predict method: in that case
```
predict(m,re.form=NA,newdata=cake2)
```

works fine (re.form=NA sets all random effects to zero, equivalent to level=0 in the old predict.lme)

Related Solutions

Solved – Random effect nested under fixed effect model in R

It doesn't make sense to both include tank as a random effect and nest tank within the pop/temp fixed effect. You only need one of these, depending on how tank is coded.

If tank is coded 1-8, you only need the tank random effect. Nesting it within the pop/temp fixed effect results in the same 8 units, so is not necessary.

If tank is coded 1-2 (that is, which rep it was), you only need to nest tank within the pop/temp fixed effect, because that gives you your 8 unique tanks. Including the tank random effect is only desired if the tanks were first divided into two groups and then randomized to treatment; if the eight tanks were completely randomized to treatment, this is not necessary.

You could do this with likelihood based solutions such those in nlme and lme4 but if everything is balanced, it might be simpler to use the traditional ANOVA approach using aov.

Creating some sample data:

set.seed(5)
d <- within(expand.grid(pop=factor(c("A","B")),
                        temp=factor(c("warm", "cold")),
                        rep=1:2,
                        fish=1:100), {
                          tank <- factor(paste(pop, temp, rep, sep="."))
                          tanke <- round(rnorm(nlevels(tank))[unclass(tank)],1)
                          e <- round(rnorm(length(pop)),1)
                          m <- 10 + 2*as.numeric(pop)*as.numeric(temp)
                          growth <- m + tanke + e
                        })

Using aov like this:

a0 <- aov(growth ~ pop*temp + Error(tank), data=d)
summary(a0)

or lme like this:

library(nlme)
m1 <- lme(growth ~ pop*temp, random=~1|tank, data=d)
anova(m1)

Solved – Backing out fixed / random effects in lmer mixed model

This is a great question. I'm not sure this answer will be satisfactory, but here is the way I tend to think about this issue.

The easiest way to make these comparisons is by contrasting predictions based on different models or different inputs -- as you suggest. Unfortunately, that's not always terribly easy to do in software. I recently co-authored an R package, merTools, in part responding to how difficult I kept finding it to produce an answer like you describe above. The package allows you to more easily calculate prediction intervals for lmer and glmer objects, as well as to explore the impact of modifying variables -- both fixed effects and random -- and seeing how they modify the predictions from the model. A simple example is below - taken from my answer to this question on SO: https://stackoverflow.com/questions/15780230/simulating-an-interaction-effect-in-a-lmer-model-in-r/31992892#31992892

The merTools package has some functionality to make this easier, though it only applies to working with lmer and glmer objects. Here's how you might do it:

library(merTools)
# fit an interaction model
m1 <- lmer(y ~ studage * service + (1|d) + (1|s), data = InstEval)
# select an average observation from the model frame
examp <- draw(m1, "average")
# create a modified data.frame by changing one value
simCase <- wiggle(examp, var = "service", values = c(0, 1))
# modify again for the studage variable
simCase <- wiggle(simCase, var = "studage", values = c(2, 4, 6, 8))

After this, we have our simulated data which looks like:

simCase
     y     studage service   d   s
1 3.205745       2       0 761 564
2 3.205745       2       1 761 564
3 3.205745       4       0 761 564
4 3.205745       4       1 761 564
5 3.205745       6       0 761 564
6 3.205745       6       1 761 564
7 3.205745       8       0 761 564
8 3.205745       8       1 761 564

Next, we need to generate prediction intervals, which we can do with merTools::predictInterval (or without intervals you could use lme4::predict)

preds <- predictInterval(m1, level = 0.9, newdata = simCase)

Now we get a preds object, which is a 3 column data.frame:

preds
       fit       lwr      upr
1 3.312390 1.2948130 5.251558
2 3.263301 1.1996693 5.362962
3 3.412936 1.3096006 5.244776
4 3.027135 1.1138965 4.972449
5 3.263416 0.6324732 5.257844
6 3.370330 0.9802323 5.073362
7 3.410260 1.3721760 5.280458
8 2.947482 1.3958538 5.136692

We can then put it all together to plot:

library(ggplot2)
plotdf <- cbind(simCase, preds)
ggplot(plotdf, aes(x = service, y = fit, ymin = lwr, ymax = upr)) + 
  geom_pointrange() + facet_wrap(~studage) + theme_bw()

Unfortunately the data here results in a rather uninteresting, but easy to interpret plot.

Best Answer

Related Solutions

Solved – Random effect nested under fixed effect model in R

Solved – Backing out fixed / random effects in lmer mixed model

Related Question