The correct lmer model formula for a Split Plot design

lme4-nlmemixed modelregressionsplit-plot

I have a standard split-plot design, having the three predictors:

  • A :: [ Categorical, Unordered, 2 Levels ] –> (1, 2)
  • B :: [ Categorical, Unordered, 4 Levels ] –> (1, 2, 3, 4)
  • C :: [ Categorical, Unordered, 2 Levels ] –> (1, 2)

Predictor B indicates the main plots, Predictor A indicates the treatment applied to entire main plots, and Predictor C indicates the treatment applied to the subplots within main plots.

The three crossed/nested pairwise relationships are:

  • A — C :: A is crossed with C
  • A -> B :: A nests B (i.e. B is nested within A)
  • B — C :: B is crossed with C

An example data set containing 8 measurements (but only showing the predictor information, not the measured responses) for this example experiment can be written in table format:

A B C
1 1 1 1
2 1 1 2
3 1 2 1
4 1 2 2
5 2 3 1
6 2 3 2
7 2 4 1
8 2 4 2

I would like to model this design as a mixed model using lme4's lmer() function, but I am confused as to the correct model formula. I see two options…

Model 1

R ~ A*C + (1|B)

This model formula has the random effects design matrix:

enter image description here

…and a random effects covariance matrix:

enter image description here


Model 2

R ~ A*C + (1|A/B)

This model formula has the random effects design matrix:

enter image description here

…and a random effects covariance matrix:

enter image description here


Both models have the same fixed effect design matrix.

Model 2 seems to properly depict predictor B as nested within A, but it seems strange to put A as both a fixed effect and a random effect.

My question is: which of these two models is correct, and why?

Best Answer

Ben Bolkers GLMM-FAQ is a reputable source and has quite bit to say about such problems. Specifically here: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#when-can-i-include-a-predictor-as-both-fixed-and-random where it links to this blog-post: https://www.muscardinus.be/2017/08/fixed-and-random/ which clearly states that on should not use a categorical variable as both fixed and random.

So Model 1 is the answer in general, however with 8 observations and 4 fixed effect coefficients you probably won't be able to estimate the standard deviation of the random effects. See here: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#singular-models-random-effect-variances-estimated-as-zero-or-correlations-estimated-as---1

I would just fit a model of the difference within each B with A as covariable to represent the interaction-coefficient A2:C2.

Here are some demonstration in R, however a lot of the factors start at 0 not 1:

# code based on https://www.muscardinus.be/2017/08/fixed-and-random/
library(tidyverse)
library(lme4)

n_a <- 2
n_b <- 2 # changed to fit your case
n_sample <- 2
sd_A <- 2
sd_B <- 1
sd_noise <- 1
dataset <- expand.grid(
  B = paste0("b", seq_len(n_a * n_b)),
  sample = seq_len(n_sample)
) %>%
  mutate(
    A = paste0("a", as.integer(B) %% n_a) %>%
      factor(),
    mu = rnorm(n_a, sd = sd_A)[A] + 
      rnorm(n_a * n_b, sd = sd_B)[B],
    Y = mu + rnorm(n(), sd = sd_noise)
  )
# add C
dataset$C <- as.character(rep(0:1, each = 4))

dataset %>% 
  arrange(A, B, C) %>% 
  select(A,B,C)
# none work well
summary(lmer(Y ~ A*C + (1|B), data = dataset))
summary(lmer(Y ~ A+C + (1|B), data = dataset))
summary(lmer(Y ~ C + (1|A/B), data = dataset))

# alternative with differences
dataset2 <- dataset %>% 
  arrange(A, B, C) %>% 
  group_by(A, B) %>% 
  summarise(d = Y[1] - Y[2])

summary(lm(d ~ A, data = dataset2))
# compare
summary(lm(Y ~ A*C, data = dataset))

# small n should have clear visual impact. This should be a counter example to that.
ggplot(data = dataset, aes (x = C, y = Y, color = A, group = B)) +
  geom_path() +
  geom_point()
Related Question