lme4 – Random Effect Specification in lmer Mixed Effect Model Using R

lme4-nlmer

What is the difference between (1|DNA.concentration/mouse.id) and (DNA.concentration|mouse.id)? What do the symbols | and / mean inside the syntax for the random effect?

Best Answer

If you have two categorical factors f and g, then (1|f/g) expands to (1|f) + (1|f:g), i.e. variation in the intercept (that's the 1 on the left-hand side of the bar) among levels of f and among levels of f:g (the interaction between f and g). This is also referred to as a random effect of g nested within f (order matters here). This is the traditional way to combine two random factors in a classical ANOVA model, because in that framework random effects must be nested (i.e. either f is nested within g or g is nested with f). (See http://glmm.wikidot.com/faq for more information on nested factors.) This model estimates two parameters, i.e. $\sigma^2_f$ and $\sigma^2_{f:g}$, no matter how many levels each categorical variable has. It would be a typical model for a nested design.

In contrast, (f|g) specifies that the effects of f vary across levels of g: for example, if f is a two-level categorical variable with levels "control" and "treatment", then this model specifies that we are allowing both the intercept (control response) and the treatment effect (difference between control and treatment responses) to vary across levels of g. Each effect has its own variance, and by default lme4 fits covariances among each of the parameters. This model would estimate parameters $\sigma^2_{g,c}$, $\sigma^2_{g,t}$, and $\sigma_{g,c\cdot t}$, where the last refers to the covariance between control and treatment effects. If $f$ has $n$ levels, this model estimates $n(n+1)/2$ parameters; it is most appropriate for a randomized-block design where each treatment is repeated in every block.

If f has many levels, the latter (f|g)) model specification can imply models with many parameters; there is an ongoing debate (see e.g. this ArXiv paper) about the best way to handle this situation.

If instead we consider (x|g) where x is a continuous (numeric) input variable, then the term specifies a random-slopes model; the intercept (implicitly) and slope with respect to x both vary across levels of g (a covariance term is also fitted).

In this case, (g|x) would make no sense - the term on the right side of the bar is a grouping variable, and is always interpreted as categorical. The only case where it could make sense is in a design where x was continuous, but multiple observations were taken at each level, and where you wanted to treat x as a categorical variable for modeling purposes.