Solved – Encoding of categorical variables (dumthe vs. effects coding) in mixed models

categorical-encodinglme4-nlmemixed modelr

The model based on the experiment looks like this:

glmer(Y ~ X*Condition + (X*Condition|subject) + (1+X|Trial))

# Y = logit variable  
# X = continuous variable  
# Condition = values A and B, dummy coded; the design is repeated 
#             so all participants go through both Conditions  
# subject = random effects for different subjects  
# trial = random effects for different trials

Until now, I thought that the interpretation of the interaction and random effects is quite straightforward:

for fixed effects:
- Intercept – what is the Y value in Condition 0 when X is 0
- X – how much does Y change for a change of 1 unit in X in Condition 0
- ConditionB – what is the difference in Intercept for Condition B from Condition A
- X*ConditionB – what is the difference in slope for ConditionB from ConditionA
for random effects:
- random intercept – random variability around Intercept
- random X – random variability around X
- random ConditionB – random variability of the difference of intercepts between ConditionB and ConditionA
- random X*ConditionB – random variability of difference in slopes between ConditionB and ConditionA

However, I've read through a very well written chapter An Introduction to Mixed Models for Experimental Psychology by Henrik Singmann and David Kellen, where they say

In other words, a mixed model (or any other regression type model) that includes interactions with factors using treatment contrasts produces parameter estimates as well as Type III tests that often do not correspond to what one wants (e.g., main effects are not what is commonly understood as a main effect).

Using effects coding is suggested as a better way to interpret the interactions of continuous variable X and categorical variable Condition. I am aware that the random effects correlation in the upper case is somewhat hard to interpret – the correlation of Intercept and X is straightforward, but the correlation of X and X:ConditionB is not, because we correlate coefficients with differences from these coefficients. Then one needs to calculate the correlation per hand, as described in How to compute correlation of random slopes for X between two Conditions with (X*Condition|subject) model in lme4?

My questions are:

Is my interpretation of fixed and random effects valid? If not, why?
Why is effects coding better than dummy coding in mixed models, and how do you interpret effect coding?

Best Answer

As said by @amoeba in the comment, the question is not so much a mixed model question, but more a general question on how to parameterize a regression model with interactions. The full quote from our chapter also provides an answer to your second question (i.e., the why):

A common contrast scheme, which is the default in R, is called treatment contrasts (i.e., contr.treatment; also called dummy coding). With treatment contrasts the first factor level serves as the baseline whereas all other levels are mapped onto exactly one of the contrast variables with a value of 1. As a consequence, the intercept corresponds to the mean of the baseline group and not the grand mean. When fitting models without interactions, this type of contrast has the advantage that the estimates (i.e., the parameters corresponding to the contrast variables) indicate whether there is a difference between the corresponding factor level and the baseline. However, when including interactions, treatment contrasts lead to results that are often difficult to interpret. Whereas the highest-order interaction is unaffected, the lower-order effects (such as main effects) are estimated at the level of the baseline, ultimately yielding what are known as simple effects rather than the usually expected lower-order effects. Importantly, this applies to both the resulting parameter estimates of the lower order effects as well as their Type III tests. In other words, a mixed model (or any other regression type model) that includes interactions with factors using treatment contrasts produces parameter estimates as well as Type III tests that often do not correspond to what one wants (e.g., main effects are not what is commonly understood as a main effect). Therefore we generally recommend to avoid treatment contrasts for models that include interactions.

Orthogonal sum-to-zero contrasts are better because they avoid potentially difficult to interpret lower-order effects. That is, for those contrasts all lower order effects are evaluated at the grand mean. For a quick explanation of dummy vs. effect coding difference, please see: http://www.lrdc.pitt.edu/maplelab/slides/Simple_Main_Effects_Fraundorf.pdf

This means for your case, almost all your interpretations are correct with one exception.

ConditionB - what is the difference in Intercept for Condition B from Condition A, when X is zero.

Hence, if zero is somewhat meaningless for your variable (e.g., it is age and you only observe adult participants) your estimate of Condition (which is now a simple effect of condition at X = 0) becomes meaningless as well.

In general, having interactions with continuous covariates is not trivial and there are at least two books and several papers which extensively discuss this issue. A common solution is centering the covariate on the mean. Whether or not this makes sense depends on your covariate. What I sometimes do when having a variable with a restricted range (e.g., it goes from 0 to 100) is to center on the midpoint of the scale (see e.g., here).

More information on centering can be found in the following references. I recommend you read the first one at least:

Dalal, D. K., & Zickar, M. J. (2012). Some Common Myths About Centering Predictor Variables in Moderated Multiple Regression and Polynomial Regression. Organizational Research Methods, 15(3), 339-362. doi:10.1177/1094428111430540 [free pdf]
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2002). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. New York: Routledge Academic. [great book]
Aiken, L. S., & West, S. G. (1991). Multiple regression: testing and interpreting interactions. Newbury Park, Calif.: Sage Publications.

There is also some mixed-model specific discussion on centering, but to me this appears to be mainly relevant for hierarchical structures (i.e., at least two-levels of nesting), e.g.,

Wang, L., & Maxwell, S. E. (2015). On disaggregating between-person and within-person effects with longitudinal data using multilevel models. Psychological Methods, 20(1), 63–83. https://doi.org/10.1037/met0000030

Potentially also relevant:

Iacobucci, D., Schneider, M. J., Popovich, D. L., & Bakamitsos, G. A. (2016). Mean centering helps alleviate “micro” but not “macro” multicollinearity. Behavior Research Methods, 48(4), 1308–1317. https://doi.org/10.3758/s13428-015-0624-x

Related Solutions

Solved – Design matrix contrast coding for model selection and ‘main effects’ vs. ‘simple main effects’ interpretation in linear mixed effects model (R/Matlab)

I thought I would explain what I ended up doing here in case it's helpful to anyone else.

Step 1: Fit the lme with effects coding

library(MASS)
library(lme4)
library(psycholing)
library(lmerTest)
contrasts(data$Group) = contr.sum(2)
    contrasts(data$A) = contr.sum(2)
    contrasts(data$B) = contr.sum(3)

lme = lmer(respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item), control=lmerControl(optCtrl=list(maxfun=100000)), data=data)

I performed model selection using sum coding and then tested the overall significance of each coefficient using anova from the lmerTest package:

lmerTest::anova(lme)

This gave me a significant Group x A x B three-way interaction.

Step 2: Switch to dummy coding and fit three models, with each level of B as the intercept.

contrasts(data$Group) = contr.treatment(2)
    contrasts(data$A) = contr.treatment(2)
contrasts(data$B) = contr.treatment(3)

# N.b. these are the default contrasts in R. contrasts(data$B)
#       B2    B3
# B1     0     0
# B2     1     0
# B3     0     1

lmeB1 = lmer(respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item), control=lmerControl(optCtrl=list(maxfun=100000)), data=data) 
b1sum = lmerTest::summary(lmeB1)

relevel(data$B, "B2") 
lmeB2 = lmer(respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item), control=lmerControl(optCtrl=list(maxfun=100000)), data=data) 
b2sum = lmerTest::summary(lmeB2)

relevel(data$B, "B3") 
lmeB3 = lmer(respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item), control=lmerControl(optCtrl=list(maxfun=100000)), data=data) 
b3sum = lmerTest::summary(lmeB3)

Step 3: Extract the contrasts of interest and apply a Bonferroni-Holm correction for multiple comparisons.

# Test the contrasts: 
#  1) Group1 A1 B1 vs. Group1 A1 B2/B3
#  2) Group1 A1 B1/B2/B3 vs. Group2 A1 B1/B2/B3
#  3) Group1 A1 B1/B2/B3 vs. Group1 A2 B1/B2/B3

pvals = cbind("B1"=p.adjust(b1sum$coefficients[c(12, 11, 8, 3), 5], "holm"),  
              "B2"=c(9,9,p.adjust(b2sum$coefficients[c(8, 3), 5],"holm")),  
          "B3"=c(9,9,p.adjust(b3sum$coefficients[c(8, 3), 5],"holm")))

# Numbers correspond to the rows with the coefficients of interest in model$coefficients, column 5 contains the p-values.

# Reference Level=Group1
#                        B1           B2        B3
# 1a) B2:A1     0.001707473 
# 1b) B3:A1     0.027679733 
# 2)  Group2:A2 0.016903682 0.0328017681 0.9451504
# 3)  A2        0.127490731 0.0008424514 0.1002219

Note that I did this in R because I also included a fixed effect for participant gender, which I coded as c(0.5, -0.5) to centre the estimates on the mean of both (effectively "controlling for" gender). This is easier to do in R with the contrasts function: in MATLAB, it seems you have to specify the entire design matrix manually if you want to use something other than effects or dummy coding.

If you don't need custom contrasts, this whole process can be done much more easily in MATLAB by fitting the model with the default (dummy) variable coding:

lme = fitlme(data, 'respVar ~ 1 + Group*A*B + (1|Subject) + (1|Item)', 'FitMethod', 'REML', 'CheckHessian', true);

Then use coefTest to specific contrast matrices for your coefficients. The following gives me an F test for the contrast between my second and third coefficients---B2 and B3 in this case---with a Satterthwaite approximation for degrees of freedom. (See this reference for a discussion of significance testing for LMEs: https://doi.org/10.3758/s13428-016-0809-y)

[pval,F,DF1,DF2]=coefTest(lme, [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0; 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'DFMethod', 'Satterthwaite')

Solved – Reporting random effects in mixed models – what is the correct choice

As in all things statistics, what you choose to report and interpret is dependent on what you are trying to do with the statistic in the first place. If your hypotheses are not related to random effects, then you don't have to report on them. If you are specifically interested in how items function, then you'll probably want to look carefully at their random effects and correlations. If you want to describe a person's random effect (ability level, if this is indeed an IRT problem), then you'd likely spend some time talking about the variability in those random effects.

In this case, it seems like you're interested in certain predictors (or at least one predictor, X1). If this is true, then you might only need to describe the model you ran and then talk about the effects of the predictors. You also need to think about what is useful for your reader to understand the model as well. Not many people can look at a mixed model table and intuitively understand what the coefficients all mean. It can be helpful to include plots that show the fixed and random effects together, especially when there are random slopes, so that the standard deviation being reported can be contextualized.

Now, I think it's important to also throw out the caveat that good modeling will require that the researcher look at all of this information. Even if you have no interest in the specific values, looking at all the random effects and their relationships can be useful for thinking about whether or not your model is misspecified, identifying potentially interesting patterns, and generally being able to understand what your model is telling you. Personally, I take a model-based approach where the end goal is to develop a cohesive model that summarizes the data and the data generation process. Others, like you described, just want to know p-values for certain effects. Again, it comes down to what question you're trying to answer and what information is needed to shed light on that question

Best Answer

Related Solutions

Solved – Design matrix contrast coding for model selection and ‘main effects’ vs. ‘simple main effects’ interpretation in linear mixed effects model (R/Matlab)

Solved – Reporting random effects in mixed models – what is the correct choice

Related Question