Regression – Correction for Multiple Comparisons Using Sum Contrasts in Linear Regression

contrastslinear modellme4-nlmeregression

I am computing the following model using the lme4 package in R:

score ~ expertise*(mood + condition + course) **(EDIT: + (1|participant))**

Outcome:
score / numeric (1-7)

Inputs:
expertise / factor (NOVice, expert)
condition / factor (ALOne, together)
mood / numeric (1-7)
course / factor (YES, NO)

All the factors are coded using sum contrasts. The result looks like this:

        Predictor                             b          Std.Er  df    
        (Intercept)                           1.52941    0.22044 152.36196   6.938 1.07e-10 ***
        expertiseNOV                         -0.26262    0.22044 152.36196  -1.191  0.23539    
        condALO                               0.02033    0.03133 710.39747   0.649  0.51666    
        mood                                  0.31964    0.03153 744.54866  10.137  < 2e-16 ***
        courseYES                            -0.07763    0.10865 417.97130  -0.714  0.47533    
        expertiseNOV:condALO                 -0.17815    0.03133 710.39747  -5.686 1.90e-08 ***
        expertiseNOV:mood                     0.09947    0.03153 744.54866   3.154  0.00167 ** 
        expertiseNOV:courseYES                0.12540    0.10865 417.97130   1.154  0.24908

If I interpret this model correctly, expertiseNOV to courseYES should be the main effects at the average level of the other predictors. Furthermore, the interaction "expertiseNOV:condALO" is for example telling me that there is a significant difference between experts and novices for the condition "alone". I now would like to know if there is also a significant difference for the condition "together". I could now reorder the factors and get the results for the following interaction: "expertiseNOV:condTOG"

Would I then need to correct my results for multiple comparisons? Or is this just the wrong way to approach this issue?

EDIT 04.10.2022

As I assigned the contrast schemes manually, I seem to have made a small mistake when assigning the names for the factor "condition". Here is the complete procedure I am using based on simulated data as proposed by @dipetkov. The code features one model using treatment contrasts and one using sum contrasts.
set.seed(1234)

n <- 100

data <- data.frame(
  expertise = sample(c("NOV", "EXP"), n, replace = TRUE),
  cond = sample(c("ALO", "TOG"), n, replace = TRUE),
  course = sample(c("YES", "NO"), n, replace = TRUE),
  mood = sample(seq(7), n, replace = TRUE),
  score = rnorm(n)
)

data <- data %>% mutate(
  expertise = as.factor(expertise),
  cond = as.factor(cond),
  course = as.factor(course),
)

#sum contrasts
contr_sum <- contr.sum(2)
colnames(contr_sum) <- c("NOV")
contrasts(data$expertise) <- contr_sum

colnames(contr_sum) <- c("TOG")
contrasts(data$cond) <- contr_sum

colnames(contr_sum) <- c("YES")
contrasts(data$course) <- contr_sum

model_sum <- lm(
  score ~ expertise * (mood + cond + course),
  data = data
)

summary(model_sum)

#treatment contrasts
contr_treatment <- contr.treatment(2)
colnames(contr_treatment) <- c("NOV")
contrasts(data$expertise) <- contr_treatment

colnames(contr_treatment) <- c("TOG")
contrasts(data$cond) <- contr_treatment

colnames(contr_treatment) <- c("YES")
contrasts(data$course) <- contr_treatment

model_trea <- lm(
  score ~ expertise * (mood + cond + course),
  data = data
)

summary(model_trea)

Output model_sum:

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)
(Intercept)             0.09177    0.23428   0.392    0.696
expertiseNOV            0.28286    0.23428   1.207    0.230
mood                    0.01672    0.05225   0.320    0.750
condTOG                 0.03264    0.10073   0.324    0.747
courseYES               0.15819    0.10194   1.552    0.124
expertiseNOV:mood      -0.06743    0.05225  -1.290    0.200
expertiseNOV:condTOG    0.06188    0.10073   0.614    0.541
expertiseNOV:courseYES -0.04754    0.10194  -0.466    0.642

Output model_trea:

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)
(Intercept)             0.57980    0.41055   1.412    0.161
expertiseNOV           -0.59440    0.60245  -0.987    0.326
mood                   -0.05071    0.06741  -0.752    0.454
condTOG                -0.18903    0.26240  -0.720    0.473
courseYES              -0.22131    0.26476  -0.836    0.405
expertiseNOV:mood       0.13485    0.10451   1.290    0.200
expertiseNOV:condTOG    0.24750    0.40291   0.614    0.541
expertiseNOV:courseYES -0.19014    0.40774  -0.466    0.642

Best Answer

You say you use the lme4 package (designed for fitting mixed-effects models) but your model formula seems to have fixed effects only. How come?

expertiseNOV to courseYES should be the main effects at the average level of the other predictors

This is wrong. We know (from the naming convention used in the summary table) that the model is fitted with the treatment coding, which is the default in R. The intercept corresponds to:

(expertise, condition, mood, course) = (expert, together, 0, NO)

the interaction "expertiseNOV:condALO" is for example telling me that there is a significant difference between experts and novices for the condition "alone"

Actually, it's telling you that there is a significant interaction between expertise and condition. Since expertise is interacted with mood and course as well, the interpretation of the interaction terms is a bit more involved. expertiseNOV:condALO is the expected difference in score between novices and experts when mood=0 and course="NO". (Substitute either mood>0 and/or course="YES", to see how the other two interactions contribute to the expected difference as well.)

So what to do? Learn about contrasts and how to use them to make the post-hoc comparisons that you'd like to make.

A popular package for post-hoc comparisons is emmeans. I generate fake data to illustrate how to use it (that code is attached at the end) but the best place to start is to read the vignettes.

# Not the same summary table since you don't provide your data.
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)
#> (Intercept)             0.57980    0.41055   1.412    0.161
#> expertiseNOV           -0.59440    0.60245  -0.987    0.326
#> mood                   -0.05071    0.06741  -0.752    0.454
#> condTOG                -0.18903    0.26240  -0.720    0.473
#> courseYES              -0.22131    0.26476  -0.836    0.405
#> expertiseNOV:mood       0.13485    0.10451   1.290    0.200
#> expertiseNOV:condTOG    0.24750    0.40291   0.614    0.541
#> expertiseNOV:courseYES -0.19014    0.40774  -0.466    0.642

Making post-hoc comparisons in a model with multiple interactions is not equivalent to looking at individual regression coefficients.

library("emmeans")

pairs(emmeans(model, ~ expertise | cond))
#> cond = ALO:
#>  contrast  estimate    SE df t.ratio p.value
#>  EXP - NOV    0.139 0.287 92   0.485  0.6288
#> 
#> cond = TOG:
#>  contrast  estimate    SE df t.ratio p.value
#>  EXP - NOV   -0.108 0.284 92  -0.381  0.7037
#> 
#> Results are averaged over the levels of: course

You can also use the contrast package to specify the comparisons you want to make as contrasts. (Be careful loading contrast and emmeans at the same time. They both define a contrast function.)

Let's reproduce the emmeans output for the comparison between expert and novices when the condition is "together". We set (condition, moon, course) = ("TOG", average mood, either "YES" or "NO")`.

library("contrast")

contrast(
  model,
  list(expertise = "EXP", cond = "TOG", mood = mean(data$mood), course = c("YES", "NO")),
  list(expertise = "NOV", cond = "TOG", mood = mean(data$mood), course = c("YES", "NO")),
  type = "average"
)
#>     contrast
#> lm model parameter contrast
#> 
#>     Contrast     S.E.     Lower     Upper     t df Pr(>|t|)
#> 1 -0.1082324 0.283703 -0.671691 0.4552262 -0.38 92   0.7037

If we choose a different mood setting, the contrast between the two expertise levels changes because of the expertise-mood interaction.

contrast(
  model,
  list(expertise = "EXP", cond = "TOG", mood = 7, course = c("YES", "NO")),
  list(expertise = "NOV", cond = "TOG", mood = 7, course = c("YES", "NO")),
  type = "average"
)
#> lm model parameter contrast
#> 
#>     Contrast      S.E.     Lower     Upper     t df Pr(>|t|)
#> 1 -0.5019994 0.4301315 -1.356278 0.3522789 -1.17 92   0.2462

And if you are not interested in a particular mood, you may prefer to visualize the expected score differences between experts and novices for all combinations of course and condition as a function of mood. This can be done quickly with the ggeffects package.

library("ggeffects")

plot(
  ggemmeans(model, terms = c("mood [1:7]", "course", "expertise", "cond"))
)

In short, multiple interactions make post-hoc comparisons more complex and more fun.

The R code used to simulate data for illustration purposes.

set.seed(1234)

n <- 100

data <- data.frame(
  expertise = sample(c("NOV", "EXP"), n, replace = TRUE),
  cond = sample(c("ALO", "TOG"), n, replace = TRUE),
  course = sample(c("YES", "NO"), n, replace = TRUE),
  mood = sample(seq(7), n, replace = TRUE),
  score = rnorm(n)
)

model <- lm(
  score ~ expertise * (mood + cond + course),
  data = data
)

Related Solutions

Solved – Using linear regression for bias correction

An estimate is biased if its expected value is not equal to the true parameter value. The magnitude of the difference between the expected or average value of the estimator and the parameter is the absolute bias of the estimator. Bias correction means that you take a biased estimate and add a constant to it to obtain an estimator with less or possibly (and ideally) zero bias.

Solved – Polynomial contrasts for regression

Just to recap (and in case the OP hyperlinks fail in the future), we are looking at a dataset hsb2 as such:

   id     female race ses schtyp prog read write math science socst
1  70        0    4   1      1    1   57    52   41      47    57
2 121        1    4   2      1    3   68    59   53      63    61
...
199 118      1    4   2      1    1   55    62   58      58    61
200 137      1    4   3      1    2   63    65   65      53    61

which can be imported here.

We turn the variable read into an ordered / ordinal variable:

hsb2$readcat<-cut(hsb2$read, 4, ordered = TRUE)
(means = tapply(hsb2$write, hsb2$readcat, mean))
 (28,40]  (40,52]  (52,64]  (64,76] 
42.77273 49.97849 56.56364 61.83333

Now we are all set to just run a regular ANOVA - yes, it is R, and we basically have a continuous dependent variable, write, and an explanatory variable with multiple levels, readcat. In R we can use lm(write ~ readcat, hsb2)

1. Generating the contrast matrix:

There are four different levels to the ordered variable readcat, so we'll have $n-1=3$ contrasts.

table(hsb2$readcat)

(28,40] (40,52] (52,64] (64,76] 
     22      93      55      30

First, let's go for the money, and take a look at the built-in R function:

contr.poly(4)
             .L   .Q         .C
[1,] -0.6708204  0.5 -0.2236068
[2,] -0.2236068 -0.5  0.6708204
[3,]  0.2236068 -0.5 -0.6708204
[4,]  0.6708204  0.5  0.2236068

Now let's dissect what went on under the hood:

scores = 1:4  # 1 2 3 4 These are the four levels of the explanatory variable.
y = scores - mean(scores) # scores - 2.5

$y = \small [-1.5, -0.5, 0.5, 1.5]$

$\small \text{seq_len(n) - 1} = [0, 1, 2, 3]$

n = 4; X <- outer(y, seq_len(n) - 1, "^") # n = 4 in this case

$\small\begin{bmatrix} 1&-1.5&2.25&-3.375\\1&-0.5&0.25&-0.125\\1&0.5&0.25&0.125\\1&1.5&2.25&3.375 \end{bmatrix}$

What happened there? the outer(a, b, "^") raises the elements of a to the elements of b, so that the first column results from the operations, $\small(-1.5)^0$, $\small(-0.5)^0$, $\small 0.5^0$ and $\small 1.5^0$; the second column from $\small(-1.5)^1$, $\small(-0.5)^1$, $\small0.5^1$ and $\small1.5^1$; the third from $\small(-1.5)^2=2.25$, $\small(-0.5)^2 = 0.25$, $\small0.5^2 = 0.25$ and $\small1.5^2 = 2.25$; and the fourth, $\small(-1.5)^3=-3.375$, $\small(-0.5)^3=-0.125$, $\small0.5^3=0.125$ and $\small1.5^3=3.375$.

Next we do a $QR$ orthonormal decomposition of this matrix and take the compact representation of Q (c_Q = qr(X)$qr). Some of the inner workings of the functions used in QR factorization in R used in this post are further explained here.

$\small\begin{bmatrix} -2&0&-2.5&0\\0.5&-2.236&0&-4.584\\0.5&0.447&2&0\\0.5&0.894&-0.9296&-1.342 \end{bmatrix}$

... of which we save the diagonal only (z = c_Q * (row(c_Q) == col(c_Q))). What lies in the diagonal: Just the "bottom" entries of the $\bf R$ part of the $QR$ decomposition. Just? well, no... It turns out that the diagonal of a upper triangular matrix contains the eigenvalues of the matrix!

Next we call the following function: raw = qr.qy(qr(X), z), the result of which can be replicated "manually" by two operations: 1. Turning the compact form of $Q$, i.e. qr(X)$qr, into $Q$, a transformation that can be achieved with Q = qr.Q(qr(X)), and 2. Carrying out the matrix multiplication $Qz$, as in Q %*% z.

Crucially, multiplying $\bf Q$ by the eigenvalues of $\bf R$ does not change the orthogonality of the constituent column vectors, but given that the absolute value of the eigenvalues appears in decreasing order from top left to bottom right, the multiplication of $Qz$ will tend to decrease the values in the higher order polynomial columns:

Matrix of Eigenvalues of R
     [,1]      [,2] [,3]      [,4]
[1,]   -2  0.000000    0  0.000000
[2,]    0 -2.236068    0  0.000000
[3,]    0  0.000000    2  0.000000
[4,]    0  0.000000    0 -1.341641

Compare the values in the later column vectors (quadratic and cubic) before and after the $QR$ factorization operations, and to the unaffected first two columns.

Before QR factorization operations (orthogonal col. vec.)
     [,1] [,2] [,3]   [,4]
[1,]    1 -1.5 2.25 -3.375
[2,]    1 -0.5 0.25 -0.125
[3,]    1  0.5 0.25  0.125
[4,]    1  1.5 2.25  3.375


After QR operations (equally orthogonal col. vec.)
     [,1] [,2] [,3]   [,4]
[1,]    1 -1.5    1 -0.295
[2,]    1 -0.5   -1  0.885
[3,]    1  0.5   -1 -0.885
[4,]    1  1.5    1  0.295

Finally we call (Z <- sweep(raw, 2L, apply(raw, 2L, function(x) sqrt(sum(x^2))), "/", check.margin = FALSE)) turning the matrix raw into an orthonormal vectors:

Orthonormal vectors (orthonormal basis of R^4)
     [,1]       [,2] [,3]       [,4]
[1,]  0.5 -0.6708204  0.5 -0.2236068
[2,]  0.5 -0.2236068 -0.5  0.6708204
[3,]  0.5  0.2236068 -0.5 -0.6708204
[4,]  0.5  0.6708204  0.5  0.2236068

This function simply "normalizes" the matrix by dividing ("/") columnwise each element by the $\small\sqrt{\sum_\text{col.} x_i^2}$. So it can be decomposed in two steps: $(\text{i})$ apply(raw, 2, function(x)sqrt(sum(x^2))), resulting in 2 2.236 2 1.341, which are the denominators for each column in $(\text{ii})$ where every element in a column is divided by the corresponding value of $(\text{i})$.

At this point the column vectors form an orthonormal basis of $\mathbb{R}^4$, until we get rid of the first column, which will be the intercept, and we have reproduced the result of contr.poly(4):

$\small\begin{bmatrix} -0.6708204&0.5&-0.2236068\\-0.2236068&-0.5&0.6708204\\0.2236068&-0.5&-0.6708204\\0.6708204&0.5&0.2236068 \end{bmatrix}$

The columns of this matrix are orthonormal, as can be shown by (sum(Z[,3]^2))^(1/4) = 1 and z[,3]%*%z[,4] = 0, for example (incidentally the same goes for rows). And, each column is the result of raising the initial $\text{scores - mean}$ to the $1$-st, $2$-nd and $3$-rd power, respectively - i.e. linear, quadratic and cubic.

2. Which contrasts (columns) contribute significantly to explain the differences between levels in the explanatory variable?

We can just run the ANOVA and look at the summary...

summary(lm(write ~ readcat, hsb2))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  52.7870     0.6339  83.268   <2e-16 ***
readcat.L    14.2587     1.4841   9.607   <2e-16 ***
readcat.Q    -0.9680     1.2679  -0.764    0.446    
readcat.C    -0.1554     1.0062  -0.154    0.877

... to see that there is a linear effect of readcat on write, so that the original values (in the third chunk of code in the beginning of the post) can be reproduced as:

coeff = coefficients(lm(write ~ readcat, hsb2))
C = contr.poly(4)
(recovered = c(coeff %*% c(1, C[1,]),
               coeff %*% c(1, C[2,]),
               coeff %*% c(1, C[3,]),
               coeff %*% c(1, C[4,])))
[1] 42.77273 49.97849 56.56364 61.83333

... or...

... or much better...

Being orthogonal contrasts the sum of their components adds to zero $\displaystyle \sum_{i=1}^t a_i = 0$ for $a_1,\cdots,a_t$ constants, and the dot product of any two of them is zero. If we could visualized them they would look something like this:

The idea behind orthogonal contrast is that the inferences that we can exctract (in this case generating coefficients via a linear regression) will be the result of independent aspects of the data. This would not be the case if we simply used $X^0, X^1, \cdots. X^n$ as contrasts.

Graphically, this is much easier to understand. Compare the actual means by groups in large square black blocks to the prediced values, and see why a straight line approximation with minimal contribution of quadratic and cubic polynomials (with curves only approximated with loess) is optimal:

If, just for effect, the coefficients of the ANOVA had been as large for the linear contrast for the other approximations (quadratic and cubic), the nonsensical plot that follows would depict more clearly the polynomial plots of each "contribution":

The code is here.

Best Answer

Related Solutions

Solved – Using linear regression for bias correction

Solved – Polynomial contrasts for regression

Related Question