Solved – Problem understanding the interaction term of mixed effects model

interactionlme4-nlmemixed modelr

I have a question regarding the meaning of an interaction term in a linear mixed effects model. Considering the following experiment (similar to a real one I have): there are three sites, each with two treatments in them. Some variable is measured over four seasons for the site-treatment combinations, and let's say there are 6 replicates for each combinations (i.e. n=144=3 sites x 4 seasons x 2 treatments x 6 reps).

So I made a fake dataset where I deliberately make seasonal variation not have an effect (data are all normal and with SD=1). However, the values are different for each site, BUT treatment 2 is consistently larger than treatment 1 at each site (excuse the strange way of making the data, but hey it illustrates my problem).

SO: The main effect for treatment is significant (no problem there), main effect for season non-significant (deliberate)….. but now: why is the site X treatment effect significant? My understanding was that having site as a random effect controls for added variation in sites, such that if a treatment effect is consistent across sites, even though the absolute values are different, then there should not be a site x treatment interaction. In my mind, the presence of such a significant interaction would mean that although treatment has an effect, it is not consistent and that the effect is site specific, as in if for some sites the blue and red dots would swop around.

Could someone please clarify where my misunderstanding is of this?

Many thanks in advance!

EDIT: Sorry, what I also wanted to add is that in my real experiment, we know that there are inherent site differences, which is exactly why we want to control for site effects…

library(ggplot2)
library(nlme)
library(emmeans)

sites <- c("Site1","Site2","Site3")
seasons <- c("Aut","Wint","Spr", "Sum")
treatment <- c("Treatment1","Treatment2")
rep <- as.factor(1:6) # let's say there are six replicates for each combination

means <- c(15,30,105,125,50,70)

dat <- expand.grid(Rep=rep, Treatment=treatment, Site=sites, Season=seasons) # consistent differences

L = list()
n = 0

set.seed(123)
for(b in 1:4){
    for(i in 1:6){
        # replicates creation
        L[[1 + n]] <- rnorm(6, means[i], 0.5)
        n <- length(L)} 
}

X <- unlist(L)

dat <- cbind(dat, X)
head(dat)

ggplot(data = dat, aes(x=Season, y=X, bg=Treatment, group=Treatment)) +      
    geom_point(size=2, shape=21, colour="black") +
    facet_grid(.~ Site ,  scales = "free")

anova(mod <- lme(X ~ Treatment + Season + Site:Treatment, random=~1|Site, data=dat))

OUTPUT:

               numDF denDF  F-value p-value
(Intercept)        1   133 346098.2  <.0001
Treatment          1   133  54308.8  <.0001
Season             3   133      0.5  0.7008
Treatment:Site     4   133  29089.5  <.0001

Plot of example data

Best Answer

I think the way you formulated the model is a bit misleading: you have Site as grouping factor (i.e., there are random intercepts contingent on which site you are measuring), but then you add it also as a fixed-effect predictor, in interaction with Treatment. I don't think that the interaction term makes much sense: if there are site-specific variation in the effect of treatment, and you want to control for those random variations (site effects), then I think the model should have the following form:

> anova(mod <- lme(X ~ Treatment + Season, random=~Treatment|Site, data=dat))
            numDF denDF   F-value p-value
(Intercept)     1   137 110.94052  <.0001
Treatment       1   137 118.08932  <.0001
Season          3   137   1.89962  0.1326

This model have random intercepts (site-specific variation in intercept) and random slopes for Treatment (site-specific variation in treatment effects). Crucially, these random effects are assumed to be Gaussian distributed with mean zero. So the fixed-effect estimate of Treatment represents the estimated mean effect for your "population" of sites. The model also estimates the variance of such effect across sites:

> VarCorr(mod)
Site = pdLogChol(Treatment) 
                    Variance     StdDev      Corr  
(Intercept)         2.056268e+03 45.34608932 (Intr)
TreatmentTreatment2 8.538427e+00  2.92205871 0.792 
Residual            7.915393e-03  0.08896849