I have just completed a multilevel, longitudinal logistic regression testing, at four different time points, whether participants in an experimental group are more likely to have committed any drug-related crime in the previous 4 weeks than participants receiving placebo.
The syntax for the glmer looks like this
sumDrug <- summary(drugMod <- glmer(drugCrime ~ group*week + (1|id), data = oti, family = binomial, control = glmerControl(optimizer = "bobyqa"), nAGQ = 10))
Where drugCrime
is the binary outcome variable "have committed one or more drug-related crimes in previous 4 weeks?" (Y/N response), group
is a factor indicating group allocation, and week
is a factor indicating time of measurement (at weeks 0, 4, 8, and 12).
If I treatment code the two factors (see here) the output of the regression (with Odds Ratios and CIs) looks like this:
var Estimate Std..Error z.value Pr...z.. OR lo hi
1 (Intercept) -1.234 0.562 -2.197 0.028 0.29 0.04 2.28
2 group2 -0.108 0.779 -0.139 0.890 0.90 0.11 7.02
3 week2 -1.450 0.632 -2.295 0.022 0.23 0.03 1.84
4 week3 -0.965 0.652 -1.480 0.139 0.38 0.05 2.98
5 week4 -0.696 0.652 -1.067 0.286 0.50 0.06 3.90
6 group2:week2 -1.366 1.011 -1.351 0.177 0.26 0.03 2.00
7 group2:week3 -0.822 1.001 -0.821 0.411 0.44 0.06 3.44
8 group2:week4 -2.580 1.136 -2.271 0.023 0.08 0.01 0.59
And the output of the Anova(drugMod)
in the car
package returns
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: drugCrime
Chisq Df Pr(>Chisq)
(Intercept) 4.8248 1 0.02805 *
group 0.0193 1 0.88955
week 5.5199 3 0.13745
group:week 5.4146 3 0.14383
If, on the other hand, I simple code the two factors (also see here) the output of the regression looks like this:
var Estimate Std..Error z.value Pr...z.. OR lo hi
1 (Intercept) -2.661 0.499 -5.328 0.000 0.07 0.01
2 group1 -1.300 0.742 -1.752 0.080 0.27 0.03
3 week2 -2.133 0.548 -3.890 0.000 0.12 0.02
4 week3 -1.376 0.532 -2.584 0.010 0.25 0.03
5 week4 -1.986 0.600 -3.311 0.001 0.14 0.02
6 group1:week2 -1.366 1.011 -1.351 0.177 0.26 0.03
7 group1:week3 -0.822 1.001 -0.821 0.411 0.44 0.06
8 group1:week4 -2.580 1.136 -2.271 0.023 0.08 0.01
this time the output of the Anova(drugMod)
is
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: drugCrime
Chisq Df Pr(>Chisq)
(Intercept) 28.3890 1 9.923e-08 ***
group 3.0684 1 0.0798276 .
week 17.8728 3 0.0004672 ***
group:week 5.4149 3 0.1438151
I understand why the regression outputs are different with the two different contrast coding schemes, but why am I getting two different Anova()
outputs for these two different coding schemes, and which is the 'correct' one to report? In a reply to this post @Ben Bolker quotes the Anova()
help, saying "Be very careful in formulating the model for type-III tests, or the hypotheses tested will not make sense." But I am not sure what 'makes sense' means in this context.
Best Answer
You have to be a little careful with
car:Anova(..., type=3)
since it only performs Type III tests if the sum-to-zero constrast coding (contr.sum
) is used and if there are no "missing cells" or other estimability issues (to borrow a term from linear model theory).The contrasts that
car:Anova(..., type=3)
produce is what I would call marginal contrasts and also what you get fromnlme::anova.lme(..., type="marginal")
. In "nice" cases these contrasts correspond to Type III contrasts ifcontr.sum
is used.My advice is therefore to:
Look at your data to check if there are any missing cells (use
table(group, week)
). Estimability issues often show up asNA
in summary coefficient tables if you uselm
butglmer
may not display them. From the looks of your model summary you are probably ok.Set the contrast-coding to
contr.sum
, fit the model, obtain the Type III tests:options(contrasts = c("contr.sum", "contr.poly")) drugMod <- glmer(drugCrime ~ group*week + (1|id), ....) car::Anova(drugMod, type=3)
It is possible to obtain the Type III tests using contrasts of Least-Square Means (or in this case their GLM generalisation). Russell Lenth has added extensive guides to his emmeans package and this vignette describes how to obtain the type III tests in this way. I would compare these with the results from
car::Anova
. Note that this approach will also not yield Type III if there are missing cells or estimability issues, but emmeans warns you if it detects such problems (and I think this is also implemented forglmer
fits).This very recommendable vignette has a very insightful discussion of Type III contrasts. It termed the approach in
car:Anova(..., type=3)
Not-Safe-Type-Three and it mentions that the instruction to usecontr.sum
is in thecar
-book (see footnote on page 15). It also has a description of which hypotheses are actually being tested if you usecontr.treatment
orcontr.SAS
and why these are probably not hypotheses that you are actually interested in (this addresses the "will not make sense" quote you mention).Edit in order to answer: "what is 'missing cells' and how to detect them?"
As an example of a real data set with missing cells, consider the soup data from the ordinal package. Note the 0-entry in the following tabulation:
If we fit a model with the interaction between
PRODID
andDAY
, one coefficient is not estimable which shows up asNA
:In general any source of non-estimability is a problem but R goes a long way to eliminate such problems in the way design matrices are constructed. I can think of two relevant sources of non-estimability: missing cells and perfectly correlated covariates and most people already automatically avoid the latter.
If there were no missing cells the Type III test of
DAY
would test the difference between days at an averagePRODID
(i.e., a simple, flat, unweighted average). This is exactly the Yates contrast that Therneau discusses, but in the presence of missing cells this straight-forward interpretation no longer holds. You might want to stay clear of Type III tests in such cases due to lack of interpretability...