You have to be a little careful with car:Anova(..., type=3)
since it only performs Type III tests if the sum-to-zero constrast coding (contr.sum
) is used and if there are no "missing cells" or other estimability issues (to borrow a term from linear model theory).
The contrasts that car:Anova(..., type=3)
produce is what I would call marginal contrasts and also what you get from nlme::anova.lme(..., type="marginal")
. In "nice" cases these contrasts correspond to Type III contrasts if contr.sum
is used.
My advice is therefore to:
Look at your data to check if there are any missing cells (use table(group, week)
). Estimability issues often show up as NA
in summary coefficient tables if you use lm
but glmer
may not display them. From the looks of your model summary you are probably ok.
Set the contrast-coding to contr.sum
, fit the model, obtain the Type III tests:
options(contrasts = c("contr.sum", "contr.poly"))
drugMod <- glmer(drugCrime ~ group*week + (1|id), ....)
car::Anova(drugMod, type=3)
It is possible to obtain the Type III tests using contrasts of Least-Square Means (or in this case their GLM generalisation). Russell Lenth has added extensive guides to his emmeans package and this vignette describes how to obtain the type III tests in this way. I would compare these with the results from car::Anova
. Note that this approach will also not yield Type III if there are missing cells or estimability issues, but emmeans warns you if it detects such problems (and I think this is also implemented for glmer
fits).
This very recommendable vignette has a very insightful discussion of Type III contrasts. It termed the approach in car:Anova(..., type=3)
Not-Safe-Type-Three and it mentions that the instruction to use contr.sum
is in the car
-book (see footnote on page 15). It also has a description of which hypotheses are actually being tested if you use contr.treatment
or contr.SAS
and why these are probably not hypotheses that you are actually interested in (this addresses the "will not make sense" quote you mention).
Edit in order to answer: "what is 'missing cells' and how to detect them?"
As an example of a real data set with missing cells, consider the soup data
from the ordinal package. Note the 0-entry in the following tabulation:
data("soup", package="ordinal")
soup$RESPONSE <- as.numeric(as.character(soup$SURENESS))
with(soup, table(DAY, PRODID))
PRODID
DAY 1 2 3 4 5 6
1 369 94 184 93 88 93
2 370 275 0 92 97 92
If we fit a model with the interaction between PRODID
and DAY
, one coefficient
is not estimable which shows up as NA
:
fm <- lm(RESPONSE ~ DAY * PRODID, data=soup)
summary(fm)
Call:
lm(formula = RESPONSE ~ DAY * PRODID, data = soup)
Residuals:
Min 1Q Median 3Q Max
-4.2043 -1.4486 0.8043 1.3818 2.5514
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.94580 0.09421 41.883 < 2e-16 ***
DAY2 -0.49715 0.13314 -3.734 0.000194 ***
PRODID2 0.67122 0.20908 3.210 0.001349 **
PRODID3 1.11398 0.16332 6.821 1.23e-11 ***
PRODID4 0.59183 0.20998 2.819 0.004876 **
PRODID5 0.92920 0.21469 4.328 1.58e-05 ***
PRODID6 1.25850 0.20998 5.993 2.47e-09 ***
DAY2:PRODID2 0.49831 0.25392 1.962 0.049861 *
DAY2:PRODID3 NA NA NA NA
DAY2:PRODID4 0.51386 0.29756 1.727 0.084347 .
DAY2:PRODID5 0.62215 0.29784 2.089 0.036854 *
DAY2:PRODID6 0.48850 0.29756 1.642 0.100823
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.81 on 1836 degrees of freedom
Multiple R-squared: 0.1022, Adjusted R-squared: 0.09735
F-statistic: 20.91 on 10 and 1836 DF, p-value: < 2.2e-16
In general any source of non-estimability is a problem but R goes a long way
to eliminate such problems in the way design matrices are constructed.
I can think of two relevant sources of non-estimability: missing cells and
perfectly correlated covariates and most people already automatically avoid
the latter.
If there were no missing cells the Type III test of DAY
would test the difference between days at an average PRODID
(i.e., a simple, flat, unweighted average). This is exactly the Yates contrast that Therneau discusses, but in the presence of missing cells this straight-forward interpretation no longer holds. You might want to stay clear of Type III tests in such cases due to lack of interpretability...
My collaborator Rafael de Andrade Moral has written a R function that allows to estimate the uncertainty for L50 estimates according to 13 different methods, including the profile-likelihood method. We used this R function in our recently-published paper in Fisheries Research:
Monitoring reproduction in fish: Assessing the adequacy of ogives and the predicted uncertainty of their L50 estimates for more reliable biological inferences
to evaluate the coverage probability of different approaches, such as the Delta method, non-parametric bootstrapping and Fieller (1944)'s analytical approach. The profile-likehood method did pretty good but the Monte Carlo approach with BCa interval led to the best nominal coverage for the simulated dataset that we have analyzed.
The R scripts for the profile-likelihood method (and others) to assess uncertainty in the L50 (or else such as the L95) is available at:
https://github.com/rafamoral/L50/blob/main/confint_L.R
The supplementary material of our MS contains information on how to use the confint_L() function.
Cheers,
Best Answer
You can have a look at the emmeans package that streamlines these calculations. But if you want to see how you could do it on your own, you could try something along these lines
However, note that these probabilities are conditional on the random effects and will not match with the population-averaged probabilities at the same combinations of
Group
andTime
; for more on this topic, check here.