ANOVA – Specifics of Post-Hoc Testing After Significant Three-Way Interaction in Factorial ANOVA

anovafamilywise-errorlsmeansmultiple-comparisonspost-hoc

TL;DR

  • Struggling with factorial three-way anova, in particular post-hoc tests and multiplicity control.
  • Four continuous dependent variables.
  • Three categorical between-subject independent variables:
    • A (2 levels)
    • B (2 levels)
    • C (4 levels)
  • Unbalanced groups: Median group size is 17, IQR is from 11 to 54, range is from 2 to 126.
  • Study design: Exploratory observational.
  • Research question: "Are there differences in any of the DVs across the levels of A overall, or within certain subgroups of B and C?"
  • Analysis plan: Four 2x2x4 type II full factorial three-way anovas (one for each DV).
  • Post-hoc plan: Pairwise comparison of A across all levels of the highest order significant interaction involving A.
  • I'm using R for this analysis (trying to use emmeans for post-hoc tests).

My question: Is my plan any good and how exactly do I go about the post-hoc testing?

Research design and data

I want to examine the effects of three between-subject factors on four different independent variables. Two of the factors have 2 levels, let's call those factors A and B, and the third factor has 4 levels, let's call it C. I am mostly interested in the effect of A on each of the DVs, but also curious whether that effect might differ across levels of B and C. I don't want to test any specific hypotheses. The study design is observational, which is why the groups are unbalanced (mean group size is 35, median is 17, IQR is from 11 to 54, range is from 2 to 126). My research design is exploratory and I guess my research question could be formulated as "Are there differences in any of the DVs across the levels of A overall, or within certain subgroups of B and C?".

Analysis plan

I am planning to run four 2x2x4 full factorial anovas (one for each DV). Since A is my main variable of interest, I'd like to follow up only on significant interactions involving A. More precisely, I'd like to conduct pairwise comparisons of A across all levels of the highest order significant interaction involving A. I'm using R Studio for this analysis and attempting to use the emmeans package for post-hoc tests.

My questions

  1. Is the idea to run four Anovas appropriate given my research question, data structure and sample size?

  2. Is a Type II Anova the right choice?

  3. How exactly do I go about the post-hoc comparisons?

    • 3.1 If I follow up on a significant three-way interaction with pairwise comparisons of A across all levels of BxC, how do I correct the family-wise error rate? I've been using emmeans and noticed that when I conduct pairwise comparisons, the p-value is only adjusted within each of the levels that I compare across. This means that for my analysis, since I would only be conducting one comparison (because A only has two levels) within each of the 2, 4 or 8 different levels (depending on which interaction of A with B and C are significant), no multiplicity corrections would be performed.
    • 3.2 In emmeans, I also noticed that the degrees of freedom for each of the post-hoc comparisons is the same as the DF for the Anova. I understand this is by design because the comparisons are based on the overall model. However, I also don't really understand it. What would be more appropriate for my analysis? Using the overall model DF or calculating DF for each comparison?
    • 3.3 Given a significant three-way interaction, would it be better to first examine the effects of AxB over each of the 4 levels of C, instead of jumping straight into the simple main effects of A across each of the levels of BxC?
  4. Does any of this even make sense?

Sorry for the lengthy post. I appreciate any input. Thanks a lot!

Best Answer

My comments on some of the questions...

  1. Yes, I think that's an OK strategy. The alternative would be you run a multivariate analysis with the 4 responses all at once. But implicit in the multivariate approach is that "which response" translates into something much like another factor having 4 levels, and it interacts with every other factor; so given your idea of breaking down and doing separate comparisons if there is an interaction, it already means you really intend to consider the four responses separately anyway.

  2. It's a good choice, in that you are considering comparisons of models that make sense (they follow a hierarchy where a model with an interaction includes all main effects and interactions contained therein).

3.1. That's a judgment call. Obviously as emmeans developer and made that the default, I am comfortable with it. I think people can go crazy with multiplicity adjustments. For example, since you're doing the same kind of analysis for each response, should we also consider adjusting for multiplicity considering all tests involving all responses as one family of tests? I think you have to draw the line somewhere.

And, BTW, if you have several sets of pairwise comparisons, you can use a Tukey adjustment for each set, but the Tukey adjustment can't be used with the combined family, because it is no longer a single set of pairwise comparisons.

3.2. If you used lm() to fit the model, or other model that has just one residual term, then the d.f. for any test of model effects is the same. That's because d.f. quantifies the relative uncertainty in estimating that one error SD.

3.3. Only if those interactions are of subject-matter interest. I would suggest using something like emmip() to display the pattern of model estimates. It really helps to visualize what is being tested.

  1. Yes, I think your questions are well thought out and clear.