Solved – Design matrix contrast coding for model selection and ‘main effects’ vs. ‘simple main effects’ interpretation in linear mixed effects model (R/Matlab)

categorical-encodingcontrastsMATLABmixed modelr

My question is about contrast coding and planned contrasts in three-way interactions for a linear mixed effects model. Sample code is provided for R and Matlab as I can work in either one, but prefer Matlab.

I have an experiment with three categorical variables:

Group (2 levels, between subjects)
Condition A (2 levels, within subjects)
Condition B (3 levels, within subjects)

The design is fully crossed (i.e. each subject is exposed to each level of B within each level of A) and the groups are balanced.

+---------+------+-------+----+----+
| Subject | Item | Group | A  | B  |
+---------+------+-------+----+----+
|       1 |    1 |     1 | A1 | B1 |
|       1 |    2 |     1 | A1 | B2 |
|       1 |    3 |     1 | A1 | B3 |
|       1 |    4 |     1 | A2 | B1 |
|       1 |    5 |     1 | A2 | B2 |
|       1 |    6 |     1 | A2 | B3 |
|       2 |    1 |     2 | A1 | B1 |
|       2 |    2 |     2 | A1 | B2 |
|       2 |    3 |     2 | A1 | B3 |
|       2 |    4 |     2 | A2 | B1 |
|       2 |    5 |     2 | A2 | B2 |
|       2 |    6 |     2 | A2 | B3 |
+---------+------+-------+----+----+

All predictor variables are coded as factors/categorical variables and ordered according to a priori hypotheses. I want to test the three-way interaction between Group, A, and B, and would like to compare B1, B2, and B3 at each level of A for each group. I fit the following model in Matlab:

lme = fitlme(data, 'respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item)', 'FitMethod', 'REML', 'DummyVarCoding', 'effects', 'CheckHessian', true);

R equivalent:

library(lme4)    
contrasts(data$Group) = c(-0.5, 0.5)
    contrasts(data$A) = c(-0.5, 0.5)
    contrasts(data$B) <- cbind(c(1/2,0,-1/2), c(1/2, -1/2,0)) 

    lme = lmer(respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item), control=lmerControl(optCtrl=list(maxfun=100000)), data=data)

This gives me the main effects of each parameter. However, I also want to see the simple main effects (i.e. the effect of each level of B within a fixed level of A for each group). Does it make sense to re-fit the model with treatment/dummy coding (the default in R and Matlab)? Do I then need to apply a Bonferroni correction for multiple comparisons?

Also, I am specifying a random effects structure using model selection with AIC, and the model selected differs (by one term) depending on whether I use effects coding or treatment coding. (The difference in AIC between both models with either coding method is ~2). If I want to report the results of both models, which type of coding should I use for model selection?

Best Answer

I thought I would explain what I ended up doing here in case it's helpful to anyone else.

Step 1: Fit the lme with effects coding

library(MASS)
library(lme4)
library(psycholing)
library(lmerTest)
contrasts(data$Group) = contr.sum(2)
    contrasts(data$A) = contr.sum(2)
    contrasts(data$B) = contr.sum(3)

lme = lmer(respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item), control=lmerControl(optCtrl=list(maxfun=100000)), data=data)

I performed model selection using sum coding and then tested the overall significance of each coefficient using anova from the lmerTest package:

lmerTest::anova(lme)

This gave me a significant Group x A x B three-way interaction.

Step 2: Switch to dummy coding and fit three models, with each level of B as the intercept.

contrasts(data$Group) = contr.treatment(2)
    contrasts(data$A) = contr.treatment(2)
contrasts(data$B) = contr.treatment(3)

# N.b. these are the default contrasts in R. contrasts(data$B)
#       B2    B3
# B1     0     0
# B2     1     0
# B3     0     1

lmeB1 = lmer(respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item), control=lmerControl(optCtrl=list(maxfun=100000)), data=data) 
b1sum = lmerTest::summary(lmeB1)

relevel(data$B, "B2") 
lmeB2 = lmer(respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item), control=lmerControl(optCtrl=list(maxfun=100000)), data=data) 
b2sum = lmerTest::summary(lmeB2)

relevel(data$B, "B3") 
lmeB3 = lmer(respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item), control=lmerControl(optCtrl=list(maxfun=100000)), data=data) 
b3sum = lmerTest::summary(lmeB3)

Step 3: Extract the contrasts of interest and apply a Bonferroni-Holm correction for multiple comparisons.

# Test the contrasts: 
#  1) Group1 A1 B1 vs. Group1 A1 B2/B3
#  2) Group1 A1 B1/B2/B3 vs. Group2 A1 B1/B2/B3
#  3) Group1 A1 B1/B2/B3 vs. Group1 A2 B1/B2/B3

pvals = cbind("B1"=p.adjust(b1sum$coefficients[c(12, 11, 8, 3), 5], "holm"),  
              "B2"=c(9,9,p.adjust(b2sum$coefficients[c(8, 3), 5],"holm")),  
          "B3"=c(9,9,p.adjust(b3sum$coefficients[c(8, 3), 5],"holm")))

# Numbers correspond to the rows with the coefficients of interest in model$coefficients, column 5 contains the p-values.

# Reference Level=Group1
#                        B1           B2        B3
# 1a) B2:A1     0.001707473 
# 1b) B3:A1     0.027679733 
# 2)  Group2:A2 0.016903682 0.0328017681 0.9451504
# 3)  A2        0.127490731 0.0008424514 0.1002219

Note that I did this in R because I also included a fixed effect for participant gender, which I coded as c(0.5, -0.5) to centre the estimates on the mean of both (effectively "controlling for" gender). This is easier to do in R with the contrasts function: in MATLAB, it seems you have to specify the entire design matrix manually if you want to use something other than effects or dummy coding.

If you don't need custom contrasts, this whole process can be done much more easily in MATLAB by fitting the model with the default (dummy) variable coding:

lme = fitlme(data, 'respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item)', 'FitMethod', 'REML', 'CheckHessian', true);

Then use coefTest to specific contrast matrices for your coefficients. The following gives me an F test for the contrast between my second and third coefficients---B2 and B3 in this case---with a Satterthwaite approximation for degrees of freedom. (See this reference for a discussion of significance testing for LMEs: https://doi.org/10.3758/s13428-016-0809-y)

[pval,F,DF1,DF2]=coefTest(lme, [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0; 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'DFMethod', 'Satterthwaite')

Related Solutions

Solved – Mixed effects model or mixed design ANOVA in R

So I've done a lot of reading and chatting to people and I have a solution.

My experimental design is a split plot design, which is quite different from a nested or hierarchical design. I was originally confusing the terms. As Robert correctly states in his answer, what is needed is a mixed effects model. Thus:

Fixed effects: Year, Treatment1, Treatment2

Random effects: Year, Block, Treatment1

The model is specified thus:

mod<- lmer(Richness~Treatment1*Treatment2*Year+(1|Block/Treatment1)+(1|Year),data=dat,poisson)

The fixed effects are the terms specified in the brackets. Since none of these are continuous (the effect of Year doesn't necessarily increase each year in a linear fashion so I have classed it as a categorical fixed effect), they are specified 1|fixed effect, where 1 represents the intercept.

If Block were actually a continuous fixed effect (obviously hypothetical!) then the fixed effects might be specified +(Block|Treatment1)+(1|Year).

The model can then be simplified as appropriate.

Several things to note:

1) When specified as a random effect, Year is listed separately from Block and Treatment1, since it doesn't have an intuitive "level" at which to be nested between them (Year isn't any different at any plot size of the experiment: for every block, plot and subplot Year is the same.

2) Treatment 2 does not need to be specified as a random effect since it represents the highest level of replication in the experiment and therefore will not be psuedoreplicated

3) In mixed effects models it is possible to specify an error distribution if errors are not normal. I have specified poisson here, since my response data are counts - this improved the distribution of the model residuals.

Solved – Maximal model for linear mixed-effects model for repeated mesaures design

The maximal structure would need to include also a random effect for the interaction between color and shape, that is:

Y ~ color * shape + (color + shape + color:shape | subject)

This will result in all your predictors (color, shape and their interaction) having a fixed effect (constant for all subjects), and a random effect (individual fluctuations around the estimated fixed effect). In this sense the model is the maximal one. Note that it might not be fully equivalent to a repeated-measures ANOVA as it doesn't make equally strict assumptions on the correlational structure (see Tom's answer).

If you don't include the interaction in the random effect part of the formula, individual variation in the interaction effect will not be considered as "random", and the model will not be equivalent to a repeated-measures ANOVA. Of course, the variance of the random deviates for the interaction (or any other random effect) might be so small that including it in the model do not improve much the fit. You can check this not only with the AIC, but with a likelihood ratio test, as model with vs without one random effect are nested one another. In principle if the likelihood ratio test is not significant, it means that you can safely remove that random effect. Simplifying the random effect structures by removing negligible components would be an example of what in the article you linked is called data-driven approach.

You can simplify the model in this way, and it would still be equivalent to a repeated-measures ANOVA:

Y ~ color*shape + (1|subject) + (0+color|subject) + (0+shape|subject) + (0+color:shape|subject)

This syntax tells lmer to not estimate the correlations of random deviates across subjects. The drawback here is that, for example, you won't be able to tell whether subjects that have a large effect of color tend to have also a larger effect of shape (or smaller effect, in case of negative correlation).

You can easily include a between-subjects predictor, the only difference is that you can't add a random effect for it. "gender" for example cannot have a random effect grouped according to subject, but it can interact with the other fixed effects, e.g.:

Y ~ color * shape * gender + (color + shape + color:shape | subject)

Best Answer

Related Solutions

Solved – Mixed effects model or mixed design ANOVA in R

Solved – Maximal model for linear mixed-effects model for repeated mesaures design

Related Question