I am wondering how to analyze differences between subspecies. I want to see if body mass differs between 3 subspecies of rodent and if sex and reproductive status are also different. We then want to see if body mass changes between season or is affected by group size.
We decided to use a GLMM model (using the glmmTMB
package in R) since multiple body mass values were taken from the same individuals. We added Individual Identity (AnimalID
in the dataset) as the random variable. Body mass was normally distributed in 2 of the subspecies but not the 3rd. Can we still do a Gaussian distribution? The sample size for the non-normal distribution is 427. The other two have about 100 body mass values.
Our predictors are: subspecies (factor with 3 levels: spp 1, 2, 3), sex (factor 2 levels: M/F), reproductive status (factor 2 levels: Repro/Non-repro), time of year (season: wet/dry), and group size (continuous variable). We will then repeat the models independently for each subspecies.
Our models using R:
allspecies <- glmmTMB(Mass ~ Species + Sex + Status + Species*Sex +
Species*Status +
Sex*Status + Species*Sex*Status +
(1|AnimalID),family=gaussian() , data = adults)
species1 <- glmmTMB(Mass ~ Sex + Status + Sex*Status + GroupSize +
Season + GroupSize*Season +
(1|AnimalID),family=gaussian() , data = species1)
#this last model then is run using the other 2 species datasets
When I want to get summary statistics to check which predictors affect body mass, do I run summary(mod)
or car::Anova(mod)
in R? We are having a disagreement on which to do.
Best Answer
Once you include interactions in your model, no single
summary()
function is likely to tell you "which predictors affect body mass." Results reported bysummary()
typically just display whether a coefficient estimate is significantly different from a value of 0. For a predictor involved in interactions, its individual coefficient will typically be the value when all of its interacting predictors are at 0 or reference levels (for continuous resp. categorical predictors). Thus the reported individual coefficient for one predictor might differ or not from 0 depending on how its interacting predictors are coded! See this page for further explanation.In a perfectly balanced design,
Anova()
might do what you wish with respect to "statistical significance." Otherwise, you have to consider the different types of ANOVA and which one best meets your needs. If residuals are far from normally distributed, you might not be meeting the technical requirements of those tests although the large number of cases might make that less crucial here. Furthermore, ANOVA by itself doesn't tell you the magnitudes of the associated differences. With a large study you might have a "statistically significant" difference that is of little practical importanceWith interactions, most interesting results (except for the "statistical significance" of the highest-level interaction itself) thus depend on some type of post-hoc test* to evaluate particular combinations of predictors of interest. Wald tests on all coefficients involving a predictor (its own, its nonlinear terms, and its interactions) together can evaluate the statistical significance of that predictor. For magnitudes of effects you can evaluate differences between different scenarios, for example via the
emmeans
package.Thus I'd recommend going back to a single model and plan on performing critical specific tests to evaluate how much your "predictors affect body mass."
*My use of "post-hoc test" here maybe isn't quite correct. That term sometimes is restricted to a test that wasn't pre-planned. I'm using it in the way implied by a comment, as a test on coefficients that isn't directly reported by a
summary
orANOVA
function.