It is worth noting that there are 2 types of random effects when using mixed effects models: random intercepts, and random coefficients/slopes. Random intercepts are used when there is clustering/grouping/nesting of observations. Whatever the thing is that the observations are clustered in (in your case Plot
), would be used as a grouping variable (and is specified by placing it on the right side of the |
symbol in the formula). Random intercepts will then be estimated for it. You can have multiple levels of clustering/grouping/nesting (if plots are nested within something else for example which doesn't seem to be the case with your setup), and you can have nesting within more than 1 factor at the same time, without one being nested in the other (crossed random effects).
Considering the model formula you wrote above:
model <- lmer(response ~ covariates + (1|Plot) + (1|Year) )
Here you have specified crossed random effects, observations are clustered within Plot and also within Year. It seems reasonable and correct to use Plot
as a clustering/nesting/grouping variable, but not Year
because (presumably) all your plots were sampled in every year, so there is no clustering/nesting/grouping within Year
.
Considering the alternative model you suggested:
model <- lmer(response ~ covariates + (Year|Plot) )
Here, you still have observations clustered within Plot
, but year is now specified as a random coefficient, giving rise to random slopes. This means that for each level of Plot
the effect of Year
is allowed to vary and the model will estimate now much it varies by estimating it's variance. Since you have 3 years, then depending on how your code the variable, you will have different interpretations - if it is a continuous variable, then you will get one estimate for the variance of each year. If you code it as a numeric (say 1,2,3) then you will get a single variance. It would usually make sense to include Year as a fixed effect too, otherwise you are imposing the restriction that the average effect of year is zero (since the random effects are assumed to come from a normal distribution with zero mean). Either model (coding Year as a factor or numeric) would make sense, the benefit of it being a factor is that it effectively allows for nonlinear change, while the drawback is
that it has more things to estimate (which sometimes results in numerical/computational problems) and thus more for you to interpret. If there is little or no nonlinearity and the variances for each year are similar it would make more sense to use the numeric coding.
Finally, it is probably worth considering whether you want to include Year as a random effect in the first place. If you are expecting the effect of Year to differ a lot between different plots then it is a good idea, but usually a better starting point is just to use Year as a fixed effect, and assess how well that model fits the data before adding it as a random effect.
At first sight, there are two main differences:
- mod1 is a random intercept and random slope model while mod2 is a random intercept model (without random slope):
(year|cluster)
implicitly includes the intercept and thus expands to (1 + year|cluster)
. In this case you assume a different baseline for each cluster and allow the clusters to vary with respect to the year effect.
- mod2 includes an interaction, mod1 doesn't: As mod2 only includes a random intercept for the cluster*year interaction you get the estimates for every unique combination of the factor levels (i.e. AA:year2, AA:year3, ...)
Note that mod1 estimates $k$ variances and $k(k-1)/2$ correlations for the $k$ random effects per cluster. If you constrain the cluster-related variances to be equal and all the correlations also to be equal (this is called compound symmetry), you get a model with many fewer (only two) variance/covariance parameters:
mod3 <- lmer(y ~ x + year + (1|cluster) + (1|year:cluster))
(note the similarity/difference compared to mod2) which can be useful when there is not enough information in the data to estimate mod1.
For more on this see this excellent post by Reinhold Kliegl and also my follow-up question: Equivalence of (0 + factor|group) and (1|group) + (1|group:factor) random effect specifications in case of compound symmetry.
So mod3 can be seen as a restricted version of mod1. Your mod2 is restricted even further, but this one-parameter parametrization is arguably less realistic.
Best Answer
You need to do something like this:
I'll consider adding a simple one-factor example somewhere in the documentation.
Note: The model is what accounts for the random effect.
emmeans
just summarizes results from a model; so if the model accounts properly for the random effect(s), you don't need to do anything extra inemmeans
.PS -- For that particular response variable, I'm guessing you will have pretty heterogeneous errors (evidenced, e.g., by a horn-shaped scatterplot of residuals versus fitted values). You probably should consider either a transformed response (e.g.,
sqrt(scatcount) ~ ...
) or a generalized linear mixed model for a Poisson response (using, e.g.,lme4::glmer()
). Such models are also supported by the emmeans package but you may want to do additional things to obtain results back on thescatcount
scale; see the vignette on transformations and link functions.