Regression – Correction for Multiple Comparisons Using Sum Contrasts in Linear Regression

contrastslinear modellme4-nlmeregression

I am computing the following model using the lme4 package in R:

score ~ expertise*(mood + condition + course) **(EDIT: + (1|participant))**

Outcome:
score / numeric (1-7)

Inputs:
expertise / factor (NOVice, expert)
condition / factor (ALOne, together)
mood / numeric (1-7)
course / factor (YES, NO)

All the factors are coded using sum contrasts. The result looks like this:

        Predictor                             b          Std.Er  df    
        (Intercept)                           1.52941    0.22044 152.36196   6.938 1.07e-10 ***
        expertiseNOV                         -0.26262    0.22044 152.36196  -1.191  0.23539    
        condALO                               0.02033    0.03133 710.39747   0.649  0.51666    
        mood                                  0.31964    0.03153 744.54866  10.137  < 2e-16 ***
        courseYES                            -0.07763    0.10865 417.97130  -0.714  0.47533    
        expertiseNOV:condALO                 -0.17815    0.03133 710.39747  -5.686 1.90e-08 ***
        expertiseNOV:mood                     0.09947    0.03153 744.54866   3.154  0.00167 ** 
        expertiseNOV:courseYES                0.12540    0.10865 417.97130   1.154  0.24908

If I interpret this model correctly, expertiseNOV to courseYES should be the main effects at the average level of the other predictors. Furthermore, the interaction "expertiseNOV:condALO" is for example telling me that there is a significant difference between experts and novices for the condition "alone". I now would like to know if there is also a significant difference for the condition "together". I could now reorder the factors and get the results for the following interaction: "expertiseNOV:condTOG"

Would I then need to correct my results for multiple comparisons? Or is this just the wrong way to approach this issue?

EDIT 04.10.2022

As I assigned the contrast schemes manually, I seem to have made a small mistake when assigning the names for the factor "condition". Here is the complete procedure I am using based on simulated data as proposed by @dipetkov. The code features one model using treatment contrasts and one using sum contrasts.
set.seed(1234)

n <- 100

data <- data.frame(
  expertise = sample(c("NOV", "EXP"), n, replace = TRUE),
  cond = sample(c("ALO", "TOG"), n, replace = TRUE),
  course = sample(c("YES", "NO"), n, replace = TRUE),
  mood = sample(seq(7), n, replace = TRUE),
  score = rnorm(n)
)

data <- data %>% mutate(
  expertise = as.factor(expertise),
  cond = as.factor(cond),
  course = as.factor(course),
)

#sum contrasts
contr_sum <- contr.sum(2)
colnames(contr_sum) <- c("NOV")
contrasts(data$expertise) <- contr_sum

colnames(contr_sum) <- c("TOG")
contrasts(data$cond) <- contr_sum

colnames(contr_sum) <- c("YES")
contrasts(data$course) <- contr_sum

model_sum <- lm(
  score ~ expertise * (mood + cond + course),
  data = data
)

summary(model_sum)

#treatment contrasts
contr_treatment <- contr.treatment(2)
colnames(contr_treatment) <- c("NOV")
contrasts(data$expertise) <- contr_treatment

colnames(contr_treatment) <- c("TOG")
contrasts(data$cond) <- contr_treatment

colnames(contr_treatment) <- c("YES")
contrasts(data$course) <- contr_treatment

model_trea <- lm(
  score ~ expertise * (mood + cond + course),
  data = data
)

summary(model_trea)

Output model_sum:

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)
(Intercept)             0.09177    0.23428   0.392    0.696
expertiseNOV            0.28286    0.23428   1.207    0.230
mood                    0.01672    0.05225   0.320    0.750
condTOG                 0.03264    0.10073   0.324    0.747
courseYES               0.15819    0.10194   1.552    0.124
expertiseNOV:mood      -0.06743    0.05225  -1.290    0.200
expertiseNOV:condTOG    0.06188    0.10073   0.614    0.541
expertiseNOV:courseYES -0.04754    0.10194  -0.466    0.642

Output model_trea:

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)
(Intercept)             0.57980    0.41055   1.412    0.161
expertiseNOV           -0.59440    0.60245  -0.987    0.326
mood                   -0.05071    0.06741  -0.752    0.454
condTOG                -0.18903    0.26240  -0.720    0.473
courseYES              -0.22131    0.26476  -0.836    0.405
expertiseNOV:mood       0.13485    0.10451   1.290    0.200
expertiseNOV:condTOG    0.24750    0.40291   0.614    0.541
expertiseNOV:courseYES -0.19014    0.40774  -0.466    0.642

Best Answer

You say you use the lme4 package (designed for fitting mixed-effects models) but your model formula seems to have fixed effects only. How come?

expertiseNOV to courseYES should be the main effects at the average level of the other predictors

This is wrong. We know (from the naming convention used in the summary table) that the model is fitted with the treatment coding, which is the default in R. The intercept corresponds to:

(expertise, condition, mood, course) = (expert, together, 0, NO)

the interaction "expertiseNOV:condALO" is for example telling me that there is a significant difference between experts and novices for the condition "alone"

Actually, it's telling you that there is a significant interaction between expertise and condition. Since expertise is interacted with mood and course as well, the interpretation of the interaction terms is a bit more involved. expertiseNOV:condALO is the expected difference in score between novices and experts when mood=0 and course="NO". (Substitute either mood>0 and/or course="YES", to see how the other two interactions contribute to the expected difference as well.)

So what to do? Learn about contrasts and how to use them to make the post-hoc comparisons that you'd like to make.

A popular package for post-hoc comparisons is emmeans. I generate fake data to illustrate how to use it (that code is attached at the end) but the best place to start is to read the vignettes.

# Not the same summary table since you don't provide your data.
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)
#> (Intercept)             0.57980    0.41055   1.412    0.161
#> expertiseNOV           -0.59440    0.60245  -0.987    0.326
#> mood                   -0.05071    0.06741  -0.752    0.454
#> condTOG                -0.18903    0.26240  -0.720    0.473
#> courseYES              -0.22131    0.26476  -0.836    0.405
#> expertiseNOV:mood       0.13485    0.10451   1.290    0.200
#> expertiseNOV:condTOG    0.24750    0.40291   0.614    0.541
#> expertiseNOV:courseYES -0.19014    0.40774  -0.466    0.642

Making post-hoc comparisons in a model with multiple interactions is not equivalent to looking at individual regression coefficients.

library("emmeans")

pairs(emmeans(model, ~ expertise | cond))
#> cond = ALO:
#>  contrast  estimate    SE df t.ratio p.value
#>  EXP - NOV    0.139 0.287 92   0.485  0.6288
#> 
#> cond = TOG:
#>  contrast  estimate    SE df t.ratio p.value
#>  EXP - NOV   -0.108 0.284 92  -0.381  0.7037
#> 
#> Results are averaged over the levels of: course

You can also use the contrast package to specify the comparisons you want to make as contrasts. (Be careful loading contrast and emmeans at the same time. They both define a contrast function.)

Let's reproduce the emmeans output for the comparison between expert and novices when the condition is "together". We set (condition, moon, course) = ("TOG", average mood, either "YES" or "NO")`.

library("contrast")

contrast(
  model,
  list(expertise = "EXP", cond = "TOG", mood = mean(data$mood), course = c("YES", "NO")),
  list(expertise = "NOV", cond = "TOG", mood = mean(data$mood), course = c("YES", "NO")),
  type = "average"
)
#>     contrast
#> lm model parameter contrast
#> 
#>     Contrast     S.E.     Lower     Upper     t df Pr(>|t|)
#> 1 -0.1082324 0.283703 -0.671691 0.4552262 -0.38 92   0.7037

If we choose a different mood setting, the contrast between the two expertise levels changes because of the expertise-mood interaction.

contrast(
  model,
  list(expertise = "EXP", cond = "TOG", mood = 7, course = c("YES", "NO")),
  list(expertise = "NOV", cond = "TOG", mood = 7, course = c("YES", "NO")),
  type = "average"
)
#> lm model parameter contrast
#> 
#>     Contrast      S.E.     Lower     Upper     t df Pr(>|t|)
#> 1 -0.5019994 0.4301315 -1.356278 0.3522789 -1.17 92   0.2462

And if you are not interested in a particular mood, you may prefer to visualize the expected score differences between experts and novices for all combinations of course and condition as a function of mood. This can be done quickly with the ggeffects package.

library("ggeffects")

plot(
  ggemmeans(model, terms = c("mood [1:7]", "course", "expertise", "cond"))
)

In short, multiple interactions make post-hoc comparisons more complex and more fun.


The R code used to simulate data for illustration purposes.

set.seed(1234)

n <- 100

data <- data.frame(
  expertise = sample(c("NOV", "EXP"), n, replace = TRUE),
  cond = sample(c("ALO", "TOG"), n, replace = TRUE),
  course = sample(c("YES", "NO"), n, replace = TRUE),
  mood = sample(seq(7), n, replace = TRUE),
  score = rnorm(n)
)

model <- lm(
  score ~ expertise * (mood + cond + course),
  data = data
)