Solved – Encoding of categorical variables (dumthe vs. effects coding) in mixed models

categorical-encodinglme4-nlmemixed modelr

The model based on the experiment looks like this:

glmer(Y ~ X*Condition + (X*Condition|subject) + (1+X|Trial))

# Y = logit variable  
# X = continuous variable  
# Condition = values A and B, dummy coded; the design is repeated 
#             so all participants go through both Conditions  
# subject = random effects for different subjects  
# trial = random effects for different trials  

Until now, I thought that the interpretation of the interaction and random effects is quite straightforward:

  • for fixed effects:

    • Intercept – what is the Y value in Condition 0 when X is 0
    • X – how much does Y change for a change of 1 unit in X in Condition 0
    • ConditionB – what is the difference in Intercept for Condition B from Condition A
    • X*ConditionB – what is the difference in slope for ConditionB from ConditionA
  • for random effects:

    • random intercept – random variability around Intercept
    • random X – random variability around X
    • random ConditionB – random variability of the difference of intercepts between ConditionB and ConditionA
    • random X*ConditionB – random variability of difference in slopes between ConditionB and ConditionA

However, I've read through a very well written chapter An Introduction to Mixed Models for Experimental Psychology by Henrik Singmann and David Kellen, where they say

In other words, a mixed model (or any other regression type model) that includes interactions with factors using treatment contrasts produces parameter estimates as well as Type III tests that often do not correspond to what one wants (e.g., main effects are not what is commonly understood as a main effect).

Using effects coding is suggested as a better way to interpret the interactions of continuous variable X and categorical variable Condition. I am aware that the random effects correlation in the upper case is somewhat hard to interpret – the correlation of Intercept and X is straightforward, but the correlation of X and X:ConditionB is not, because we correlate coefficients with differences from these coefficients. Then one needs to calculate the correlation per hand, as described in How to compute correlation of random slopes for X between two Conditions with (X*Condition|subject) model in lme4?

My questions are:

  1. Is my interpretation of fixed and random effects valid? If not, why?

  2. Why is effects coding better than dummy coding in mixed models, and how do you interpret effect coding?

Best Answer

As said by @amoeba in the comment, the question is not so much a mixed model question, but more a general question on how to parameterize a regression model with interactions. The full quote from our chapter also provides an answer to your second question (i.e., the why):

A common contrast scheme, which is the default in R, is called treatment contrasts (i.e., contr.treatment; also called dummy coding). With treatment contrasts the first factor level serves as the baseline whereas all other levels are mapped onto exactly one of the contrast variables with a value of 1. As a consequence, the intercept corresponds to the mean of the baseline group and not the grand mean. When fitting models without interactions, this type of contrast has the advantage that the estimates (i.e., the parameters corresponding to the contrast variables) indicate whether there is a difference between the corresponding factor level and the baseline. However, when including interactions, treatment contrasts lead to results that are often difficult to interpret. Whereas the highest-order interaction is unaffected, the lower-order effects (such as main effects) are estimated at the level of the baseline, ultimately yielding what are known as simple effects rather than the usually expected lower-order effects. Importantly, this applies to both the resulting parameter estimates of the lower order effects as well as their Type III tests. In other words, a mixed model (or any other regression type model) that includes interactions with factors using treatment contrasts produces parameter estimates as well as Type III tests that often do not correspond to what one wants (e.g., main effects are not what is commonly understood as a main effect). Therefore we generally recommend to avoid treatment contrasts for models that include interactions.

Orthogonal sum-to-zero contrasts are better because they avoid potentially difficult to interpret lower-order effects. That is, for those contrasts all lower order effects are evaluated at the grand mean. For a quick explanation of dummy vs. effect coding difference, please see: http://www.lrdc.pitt.edu/maplelab/slides/Simple_Main_Effects_Fraundorf.pdf

This means for your case, almost all your interpretations are correct with one exception.

  • ConditionB - what is the difference in Intercept for Condition B from Condition A, when X is zero.

Hence, if zero is somewhat meaningless for your variable (e.g., it is age and you only observe adult participants) your estimate of Condition (which is now a simple effect of condition at X = 0) becomes meaningless as well.

In general, having interactions with continuous covariates is not trivial and there are at least two books and several papers which extensively discuss this issue. A common solution is centering the covariate on the mean. Whether or not this makes sense depends on your covariate. What I sometimes do when having a variable with a restricted range (e.g., it goes from 0 to 100) is to center on the midpoint of the scale (see e.g., here).

More information on centering can be found in the following references. I recommend you read the first one at least:

There is also some mixed-model specific discussion on centering, but to me this appears to be mainly relevant for hierarchical structures (i.e., at least two-levels of nesting), e.g.,

  • Wang, L., & Maxwell, S. E. (2015). On disaggregating between-person and within-person effects with longitudinal data using multilevel models. Psychological Methods, 20(1), 63–83. https://doi.org/10.1037/met0000030

Potentially also relevant:

  • Iacobucci, D., Schneider, M. J., Popovich, D. L., & Bakamitsos, G. A. (2016). Mean centering helps alleviate “micro” but not “macro” multicollinearity. Behavior Research Methods, 48(4), 1308–1317. https://doi.org/10.3758/s13428-015-0624-x
Related Question