Random Effects Model – Does it Make Sense to Include a Factor as Both Fixed and Random in Linear Mixed Effects Model?

lme4-nlmerrandom-effects-model

I know this sounds contradictory, if we think of main effects, as the random factor would remove some of the variation that we want to explain by the fixed effect.

But let’s look at a study design where fields were paired (according to land use in the surroundings, distance etc.). In each field pair one was randomly assigned to be sown with an insecticide while the other one served as control. In each field several bee colonies were placed and a response variable (e.g. the weight) was measured.

# Example data: 

set.seed(123)

colony = as.factor(1:96)
colony_effect = rnorm(96, mean = 2)

field = as.factor(sort(rep(c(1:16),6)))
field_e = rnorm(16, mean = 2)
field_effect = rep(field_e, each = 6)

field_pair = as.factor(sort(rep(c(1:8),12)))
field_pair_e = rnorm(8, mean = 2)
field_pair_effect = rep(field_pair_e, each = 12)

treatment = as.factor(rep(c(rep("control", 6), rep("treat", 6)), 8))
treatment_effect = rep(c(rep(0, 6), rep(1, 6)), 8)     

response = treatment_effect + field_effect + field_pair_effect + colony_effect
df1 = data.frame(treatment, field_pair, field, colony, response)

Intuitively, I would specify the model like this, if I want to test for a treatment effect on the response:

library(lme4)
lmm2 = lmer(response ~ treatment + (1|field_pair/field), data = df1)
summary(lmm2)
anova(lmm2, update(lmm2,.~1 + (1|field_pair/field)), test = "LRT") # p = 0.008118

I think my colleagues used the same model only that they specified it slightly differently:

lmm2_2 = lmer(response ~ treatment + (1|field_pair) + (1|field_pair:treatment), data = df1)
summary(lmm2_2)
anova(lmm2_2, update(lmm2_2,.~1 + (1|field_pair) + (1|field_pair:treatment)), test = "LRT") #  p = 0.008118

However, an answer in this thread on a question on the choice of random factors, made me wonder whether or not field pair should be included in the model. The asker wanted to explain the fish growth of two fish populations that were placed in several tanks and exposed to two temperature regimes (crossed fixed effects: population type and temperature).

@Aron answered:

”It doesn't make sense to both include tank as a random effect and
nest tank within the pop/temp fixed effect. You only need one of
these, depending on how tank is coded.”

And:

“Including the tank random effect is only desired if the tanks were
first divided into two groups and then randomized to treatment; if the
eight tanks were completely randomized to treatment, this is not
necessary.”

This made me wonder whether the random structure that my colleagues used (and that I intend to use) is correct, since

  • Treatment is both fixed and random effect

  • Field pair was included rather than just field (which I believe would
    be analogous to tank in the fish example, if the tanks were grouped
    beforehand)

    For comaparison, model without field pair:

    lmm1 = lmer(response ~ treatment + (1|field), data = df1)
    summary(lmm1)
    anova(lmm1, update(lmm1,.~1 + (1|field)), test = "LRT") # p = 0.0126
    

Best Answer

To answer to the title of your question: yes, it can make sense to include a factorial variable as fixed and a random effect. Depending on the data structure/ experimental design this may even be necessary to do so to arrive at valid conclusions. For a very readable discussion of the topic see: Barr DJ, Levy R, Scheepers C, Tily HJ. Random effects structure for confirmatory hypothesis testing: Keep it maximal. J Mem Lang 2013;68(3):255–78.

There is a similar example with an answer by @Ben Bolker, who also recommends to use one variable (organ) as fixed and mixed: Mixed Effects Model with Nesting.

I could not quite wrap my head around the answer to your particular experiment, yet. The question is, how many random effects are "enough" to account for all dependencies introduced by your experimental design. Your lmm2 looks fine to me. Intuitively, I would have specified the model with a random slope for treatment within field_pairs and ignore fields alltogether, like:

lmm3 = lmer(response ~ treatment + (1+treatment|field_pair), data = df1)
summary(lmm3)
anova(lmm3, update(lmm3,.~1 + (1+treatment|field_pair)), test = "LRT") # p=.00862

This gives very similar results to your lmm2, but yields a slightly lower AIC in your sample data. I would be glad to hear other opinions on this and whether these approaches are equivalent.