Solved – Difference between (factor|group) and (1|factor:group) specifications in lme4

lme4-nlmemixed modelrrepeated measures

I have a repeated measures from clusters, over several years, and I expect the cluster effect to vary each year. Year is coded as a factor, with '1' as reference level.
I have tried the following:

mod1 <- lmer(y ~ x + year + (year|cluster))

and

mod2 <- lmer(y ~ x + year + (1|cluster:year))

My first example specifies the following random effects:

ranef(mod1)
$cluster
    (Intercept)  year2         year3        year4         year5
AA   0.03721015  0.0573160920 -0.114709171  0.1588302187  0.125329740
AB  -0.12958994 -0.0458997003  0.216455596  0.2345170893  0.248950509
AC  -5.10692972 -0.1311546328  1.130347798  2.5215580167  5.070106525
AD   0.10087455 -0.2677088515 -0.345583355 -0.2442831982 -0.257074662
....

Second one specifies:

ranef(mod2)
$`cluster:year`
              (Intercept)
AA:year2  0.0838186244
AA:year3 -0.1197284361
AA:year4  0.1944488619
AA:year5  0.1562090690

I assumed they would be equivalent, given year is a factor, but I must not understand the random effect specification of lme4 well enough. Can anyone help me understand the difference between the two parameterizations?

Best Answer

At first sight, there are two main differences:

  1. mod1 is a random intercept and random slope model while mod2 is a random intercept model (without random slope): (year|cluster) implicitly includes the intercept and thus expands to (1 + year|cluster). In this case you assume a different baseline for each cluster and allow the clusters to vary with respect to the year effect.
  2. mod2 includes an interaction, mod1 doesn't: As mod2 only includes a random intercept for the cluster*year interaction you get the estimates for every unique combination of the factor levels (i.e. AA:year2, AA:year3, ...)

Note that mod1 estimates $k$ variances and $k(k-1)/2$ correlations for the $k$ random effects per cluster. If you constrain the cluster-related variances to be equal and all the correlations also to be equal (this is called compound symmetry), you get a model with many fewer (only two) variance/covariance parameters:

mod3 <- lmer(y ~ x + year + (1|cluster) + (1|year:cluster))

(note the similarity/difference compared to mod2) which can be useful when there is not enough information in the data to estimate mod1. For more on this see this excellent post by Reinhold Kliegl and also my follow-up question: Equivalence of (0 + factor|group) and (1|group) + (1|group:factor) random effect specifications in case of compound symmetry.

So mod3 can be seen as a restricted version of mod1. Your mod2 is restricted even further, but this one-parameter parametrization is arguably less realistic.