R – Representing Repeated Measurements Within Sample Plots in Linear Mixed Effects Models

lme4-nlmerrepeated measures

I have collected data on gas fluxes from plots of soil subjected to 5 different treatments ("D2", "K2", "M", "N", and "O2"), which also possessed variable clay contents. The experiment was laid out in a randomized complete block design, with 4 replications. Within each plot, two separate measurements of flux were performed. The resulting data.frame resembles the following:

block   treatment   subsample       flux        clay
1           D2          1           112068.6003 14.8
1           D2          2           129223.1641 14.8
1           K2          1           256712.4712 15.5
1           K2          2           113343.9756 15.5
1           M2          1           85794.47834 16.4
1           M2          2           -33620.6990 16.4
1           N           1           70283.98133 18.2
1           N           2           49569.84621 18.2
1           O2          1           100553.1116 13.4
1           O2          2           38885.99674 13.4
2           D2          1           96968.58451 15.8

I want to build a linear mixed effect model that takes account of this subsampling, and have come up with:

flux.lme <- lme(flux ~ treatment + block, random = ~1|subsample)

Which produces an ANOVA table:

> anova(flux.lme)
               numDF denDF   F-value p-value
(Intercept)        1    26 158.15781  <.0001
treatment          4    26   8.88691  0.0001
clay               1    26   1.72640  0.2003
block              3    26   1.59188  0.2153
treatment:clay     4    26   1.73011  0.1736

This output seems a bit strange to me, as the denominator degrees of freedom should not be 26, which is taking each repeated sampling as independent experimental unit. It should instead be based on the number of “main plots”, which is 20. In my case the denominator d.f. should be 20-1-3-1-4-4=7. Is is possible to instruct the lme() function to use this value?

Best Answer

I would recommend getting a few resources if you're just getting started fitting mixed models. I think Mixed Effects Models and Extensions in Ecology in R (Zuur et al. 2009) is a nice (approachable) place to start. I often peruse http://glmm.wikidot.com/faq, as well. It is R specific but I always learn a lot about mixed models in general.

I think it will help you to write out your study design explicitly to make sure you are accounting for all the variation in your model.
You have 4 blocks (account for variation among blocks).
You have 5 plots in each block for 20 plots total (account for variation within blocks).
You have 2 subsamples in each plot for 40 observations total (account for within plot variation).

I had to make a fake dataset, next time please make any coding questions reproducible by providing data (possibly with dput()).

block = factor(rep(1:4, each = 10))
treatment = factor(rep(rep(1:5, each = 2), 4))
subsample = factor(rep(1:2, 20))
flux = rnorm(40, 10, 7)
clay = rep(rnorm(20, 10, 2), each = 2)

dat1 = data.frame(block, treatment, subsample, flux, clay)

Here is your first model. You account for block to block variation by using block as a fixed effect. You use the two-level variable "subsample" as a random effect.

require(nlme)

fluxlme = lme(flux ~ treatment*clay + block, random = ~1|subsample, data = dat1)
summary(fluxlme)
anova(fluxlme)

At the bottom of the summary you will see this:

Number of Observations: 40
Number of Groups: 2

This is where I check if my model reflects my design. Do you have two groups with 20 observations per group? Nope, you have 20 groups (plots) with 2 observations per group. You failed to account for one of the levels of variation in your design in your model.

Make a variable to represent your plots nested in blocks. Because treatment represents each plot within each block, you can combine the block variable and the treatment variable to make a new variable with a unique identifier for each plot. This is called explicit nesting. You can read more about implicit vs explicit nesting online, starting with the website I listed above. I've found that using explicit nesting avoids a lot of confusion and mistakes on my part.

dat1$plot = with(dat1, interaction(block, treatment) )

fluxlme2 = lme(flux ~ treatment*clay + block, random = ~1|plot, data = dat1)
summary(fluxlme2)
anova(fluxlme2)

The subsamples are the observation-level measurement, which is represented by the residual error term in linear mixed models.