lme4 – Random Effect Specification in lmer Mixed Effect Model Using R

lme4-nlmer

What is the difference between (1|DNA.concentration/mouse.id) and (DNA.concentration|mouse.id)? What do the symbols | and / mean inside the syntax for the random effect?

Best Answer

If you have two categorical factors f and g, then (1|f/g) expands to (1|f) + (1|f:g), i.e. variation in the intercept (that's the 1 on the left-hand side of the bar) among levels of f and among levels of f:g (the interaction between f and g). This is also referred to as a random effect of g nested within f (order matters here). This is the traditional way to combine two random factors in a classical ANOVA model, because in that framework random effects must be nested (i.e. either f is nested within g or g is nested with f). (See http://glmm.wikidot.com/faq for more information on nested factors.) This model estimates two parameters, i.e. $\sigma^2_f$ and $\sigma^2_{f:g}$, no matter how many levels each categorical variable has. It would be a typical model for a nested design.

In contrast, (f|g) specifies that the effects of f vary across levels of g: for example, if f is a two-level categorical variable with levels "control" and "treatment", then this model specifies that we are allowing both the intercept (control response) and the treatment effect (difference between control and treatment responses) to vary across levels of g. Each effect has its own variance, and by default lme4 fits covariances among each of the parameters. This model would estimate parameters $\sigma^2_{g,c}$, $\sigma^2_{g,t}$, and $\sigma_{g,c\cdot t}$, where the last refers to the covariance between control and treatment effects. If $f$ has $n$ levels, this model estimates $n(n+1)/2$ parameters; it is most appropriate for a randomized-block design where each treatment is repeated in every block.

If f has many levels, the latter (f|g)) model specification can imply models with many parameters; there is an ongoing debate (see e.g. this ArXiv paper) about the best way to handle this situation.

If instead we consider (x|g) where x is a continuous (numeric) input variable, then the term specifies a random-slopes model; the intercept (implicitly) and slope with respect to x both vary across levels of g (a covariance term is also fitted).

In this case, (g|x) would make no sense - the term on the right side of the bar is a grouping variable, and is always interpreted as categorical. The only case where it could make sense is in a design where x was continuous, but multiple observations were taken at each level, and where you wanted to treat x as a categorical variable for modeling purposes.

Related Solutions

R – Using lmer for Repeated-Measures Linear Mixed-Effect Model

I think that your approach is correct. Model m1 specifies a separate intercept for each subject. Model m2 adds a separate slope for each subject. Your slope is across days as subjects only participate in one treatment group. If you write model m2 as follows it's more obvious that you model a separate intercept and slope for each subject

m2 <- lmer(Obs ~ Treatment * Day + (1+Day|Subject), mydata)

This is equivalent to:

m2 <- lmer(Obs ~ Treatment + Day + Treatment:Day + (1+Day|Subject), mydata)

I.e. the main effects of treatment, day and the interaction between the two.

I think that you don't need to worry about nesting as long as you don't repeat subject ID's within treatment groups. Which model is correct, really depends on your research question. Is there reason to believe that subjects' slopes vary in addition to the treatment effect? You could run both models and compare them with anova(m1,m2) to see if the data supports either one.

I'm not sure what you want to express with model m3? The nesting syntax uses a /, e.g. (1|group/subgroup).

I don't think that you need to worry about autocorrelation with such a small number of time points.

Solved – Backing out fixed / random effects in lmer mixed model

This is a great question. I'm not sure this answer will be satisfactory, but here is the way I tend to think about this issue.

The easiest way to make these comparisons is by contrasting predictions based on different models or different inputs -- as you suggest. Unfortunately, that's not always terribly easy to do in software. I recently co-authored an R package, merTools, in part responding to how difficult I kept finding it to produce an answer like you describe above. The package allows you to more easily calculate prediction intervals for lmer and glmer objects, as well as to explore the impact of modifying variables -- both fixed effects and random -- and seeing how they modify the predictions from the model. A simple example is below - taken from my answer to this question on SO: https://stackoverflow.com/questions/15780230/simulating-an-interaction-effect-in-a-lmer-model-in-r/31992892#31992892

The merTools package has some functionality to make this easier, though it only applies to working with lmer and glmer objects. Here's how you might do it:

library(merTools)
# fit an interaction model
m1 <- lmer(y ~ studage * service + (1|d) + (1|s), data = InstEval)
# select an average observation from the model frame
examp <- draw(m1, "average")
# create a modified data.frame by changing one value
simCase <- wiggle(examp, var = "service", values = c(0, 1))
# modify again for the studage variable
simCase <- wiggle(simCase, var = "studage", values = c(2, 4, 6, 8))

After this, we have our simulated data which looks like:

simCase
     y     studage service   d   s
1 3.205745       2       0 761 564
2 3.205745       2       1 761 564
3 3.205745       4       0 761 564
4 3.205745       4       1 761 564
5 3.205745       6       0 761 564
6 3.205745       6       1 761 564
7 3.205745       8       0 761 564
8 3.205745       8       1 761 564

Next, we need to generate prediction intervals, which we can do with merTools::predictInterval (or without intervals you could use lme4::predict)

preds <- predictInterval(m1, level = 0.9, newdata = simCase)

Now we get a preds object, which is a 3 column data.frame:

preds
       fit       lwr      upr
1 3.312390 1.2948130 5.251558
2 3.263301 1.1996693 5.362962
3 3.412936 1.3096006 5.244776
4 3.027135 1.1138965 4.972449
5 3.263416 0.6324732 5.257844
6 3.370330 0.9802323 5.073362
7 3.410260 1.3721760 5.280458
8 2.947482 1.3958538 5.136692

We can then put it all together to plot:

library(ggplot2)
plotdf <- cbind(simCase, preds)
ggplot(plotdf, aes(x = service, y = fit, ymin = lwr, ymax = upr)) + 
  geom_pointrange() + facet_wrap(~studage) + theme_bw()

Unfortunately the data here results in a rather uninteresting, but easy to interpret plot.

Best Answer

Related Solutions

R – Using lmer for Repeated-Measures Linear Mixed-Effect Model

Solved – Backing out fixed / random effects in lmer mixed model

Related Question