Solved – Fitting and interpreting random effects in repeated measures and unbalanced ecological data set

ecologylme4-nlmemixed modelrrepeated measures

I have a vegetation data set that consists of 150 plots that were sampled 1-3 times over a three year period. Plots are my unit of observation and they are unbalanced (since plots were sampled either once, twice, or three times). I would like to use mixed-effects models in order to account for variation in both plots and sampling year and to keep my sample size large (instead of conducting my analysis within individual years).

My response variables are cover of vegetation functional groups and predictors include variables related to fire and treatment history. Additionally, I am not interested in how plots change over time per se, but rather, in aggregating sampling from all three years to increase my sample size and to account for the spatial/temporal correlation that arises from doing so. It is my assumption that treating plot as a random effect (intercept only) accounts for variation that arises from potential spatial autocorrelation, but my main question is how to account for the repeated measures and if I need to account for the grouping of cells within sampling years:

Potential model:

model <- lmer(response~covariates + (1|Plot) + (1|Year).

However, I know that is not appropriate to use a random effect with only three levels, year in this case. I'm hoping for recommendations on how to incorporate year as a random effect. Is including (Year | Plot) recommended? And if so, how might I interpret that effect, i.e., is it accounting for variation introduced by different sampling year or variation in plots over sampling year?

Best Answer

It is worth noting that there are 2 types of random effects when using mixed effects models: random intercepts, and random coefficients/slopes. Random intercepts are used when there is clustering/grouping/nesting of observations. Whatever the thing is that the observations are clustered in (in your case Plot), would be used as a grouping variable (and is specified by placing it on the right side of the | symbol in the formula). Random intercepts will then be estimated for it. You can have multiple levels of clustering/grouping/nesting (if plots are nested within something else for example which doesn't seem to be the case with your setup), and you can have nesting within more than 1 factor at the same time, without one being nested in the other (crossed random effects).

Considering the model formula you wrote above:

model <- lmer(response ~ covariates + (1|Plot) + (1|Year) )

Here you have specified crossed random effects, observations are clustered within Plot and also within Year. It seems reasonable and correct to use Plot as a clustering/nesting/grouping variable, but not Year because (presumably) all your plots were sampled in every year, so there is no clustering/nesting/grouping within Year.

Considering the alternative model you suggested:

model <- lmer(response ~ covariates + (Year|Plot) )

Here, you still have observations clustered within Plot, but year is now specified as a random coefficient, giving rise to random slopes. This means that for each level of Plot the effect of Year is allowed to vary and the model will estimate now much it varies by estimating it's variance. Since you have 3 years, then depending on how your code the variable, you will have different interpretations - if it is a continuous variable, then you will get one estimate for the variance of each year. If you code it as a numeric (say 1,2,3) then you will get a single variance. It would usually make sense to include Year as a fixed effect too, otherwise you are imposing the restriction that the average effect of year is zero (since the random effects are assumed to come from a normal distribution with zero mean). Either model (coding Year as a factor or numeric) would make sense, the benefit of it being a factor is that it effectively allows for nonlinear change, while the drawback is that it has more things to estimate (which sometimes results in numerical/computational problems) and thus more for you to interpret. If there is little or no nonlinearity and the variances for each year are similar it would make more sense to use the numeric coding.

Finally, it is probably worth considering whether you want to include Year as a random effect in the first place. If you are expecting the effect of Year to differ a lot between different plots then it is a good idea, but usually a better starting point is just to use Year as a fixed effect, and assess how well that model fits the data before adding it as a random effect.

Related Question