R Programming – Correcting for Multiple Pairwise Comparisons with GAM Objects in R

generalized-additive-modelmgcvmultiple-comparisonspost-hocr

My question concerns pairwise comparisons of factor levels in a gam object. I have a dataframe, df, containing reaction time data (in ms) to stimuli varying in shape:

subjectID     RT     shape
001           501    square
002           722    circle
003           302    square
...

gam (from the mgcv package) offers a nice way to model this data while handling random-effects variables such as subjectID (i.e. controlling for 'random' variability between subjects):

m1 = gam(data = df, formula = lrt ~ Rp * slfreq + s(subjectID, bs = "re")

I can use summary(m1) to view pairwise contrasts of individual conditions (and corresponding p values) within the factor shape, however this method only gives uncorrected p values.

My question is this: is there a method for applying corrections to these p values (for example, a Tukey correction) when dealing with gam objects in R?

Best Answer

The glht() function for generalized linear hypotheses from the multcomp package can be used to carry out various kinds of contrasts using a range of different p-value adjustments. The contrasts you are looking for are also called "Tukey" contrasts for all pairwise comparisons. The p-value adjustments include single-step, Shaffer, Westfall, and all p.adjust methods, see ?summary.glht.

As @GavinSimpson pointed out in the comments: For gam() objects from mgcv this does not work out of the box but requires some manual intervention. For lmer() from lme4 everything works conveniently. I illustrate below how both packages can be used with multcomp to obtain equivalent results. For illustration I use the sleepstudy data from lme4 but collapse the numeric regressor Days to a three-level factor (merely for illustration purposes):

library("lme4")
data("sleepstudy", package = "lme4")
sleepstudy$Days <- cut(sleepstudy$Days, breaks = c(-Inf, 2.5, 5.5, Inf),
  labels = c("low", "med", "high"))
m1 <- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
summary(m1)
## ...
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  262.170      9.802  26.747
## Daysmed       31.217      6.365   4.905
## Dayshigh      67.433      5.954  11.326
## ...

Then glht() can be used to set up all pairwise (aka Tukey) contrasts for the Days factor. The summary() method then applies the p-value adjustment (single-step, by default).

library("multcomp")
g1 <- glht(m1, linfct = mcp(Days = "Tukey"))
summary(g1)
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## Fit: lmer(formula = Reaction ~ Days + (1 | Subject), data = sleepstudy)
## 
## Linear Hypotheses:
##                 Estimate Std. Error z value Pr(>|z|)    
## med - low == 0    31.217      6.365   4.905 2.28e-06 ***
## high - low == 0   67.433      5.954  11.326  < 1e-06 ***
## high - med == 0   36.216      5.954   6.083  < 1e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

The same model can be fitted with gam() as described in the question.

library("mgcv")
m2 <- gam(Reaction ~ Days + s(Subject, bs = "re"), data = sleepstudy)
summary(m2)
## ...
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  262.170      9.802  26.747  < 2e-16 ***
## Daysmed       31.217      6.365   4.905 2.27e-06 ***
## Dayshigh      67.433      5.954  11.326  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ...

However, the mcp(Days = "Tukey") method for describing the Tukey contrasts does not cooperate with gam() output and hence fails:

g2 <- glht(m2, linfct = mcp(Days = "Tukey"))
## Error in linfct[[nm]] %*% C : 
##   requires numeric/complex matrix/vector arguments

However, it is not difficult (albeit a bit technical and tedious) to set up the contrast matrix by hand.

contr <- matrix(0, nrow = 3, ncol = length(coef(m2)))
colnames(contr) <- names(coef(m2))
rownames(contr) <- c("med - low", "high - low", "high - med")
contr[, 2:3] <- rbind(c(1, 0), c(0, 1), c(-1, 1))

The first columns of the contrast matrix show what is needed here: As the low coefficient is constrained to zero in the model, med - low is simply med and analogously for high - low. The last row then shows the contrast for high - med:

contr[, 1:5]
##            (Intercept) Daysmed Dayshigh s(Subject).1 s(Subject).2
## med - low            0       1        0            0            0
## high - low           0       0        1            0            0
## high - med           0      -1        1            0            0

And with this contrast matrix we can conduct the pairwise comparison with glht():

g2 <- glht(m2, linfct = contr)
summary(g2)
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: gam(formula = Reaction ~ Days + s(Subject, bs = "re"), data = sleepstudy)
## 
## Linear Hypotheses:
##                 Estimate Std. Error z value Pr(>|z|)    
## med - low == 0    31.217      6.365   4.905 2.35e-06 ***
## high - low == 0   67.433      5.954  11.326  < 1e-06 ***
## high - med == 0   36.216      5.954   6.083  < 1e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Another quite convenient way to indicate the contrasts to be tested is through character strings. This can set up linear functions based on the effect names from names(coef(m2)). And for factors with fewer levels (and hence fewer Tukey contrasts) this works quite nicely - but if the comparisons become more complex it's possibly easier to constract the contrast matrix as above.

g3 <- glht(m2, linfct = c("Daysmed = 0", "Dayshigh = 0", "Dayshigh - Daysmed = 0"))
summary(g3)
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: gam(formula = Reaction ~ Days + s(Subject, bs = "re"), data = sleepstudy)
## 
## Linear Hypotheses:
##                         Estimate Std. Error z value Pr(>|z|)    
## Daysmed == 0              31.217      6.365   4.905 2.53e-06 ***
## Dayshigh == 0             67.433      5.954  11.326  < 1e-06 ***
## Dayshigh - Daysmed == 0   36.216      5.954   6.083  < 1e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)