Solved – Doing post-hoc after a not significant interaction in mixed ANOVA

anovainteractionpost-hocstatistical significance

I conducted a mixed design ANOVA with two within-subjects factors: FactorA (2 levels), FactorB (2 levels), and one between-subjects factor: Group (2 levels). My main hypothesis regards the interaction FactorAFactorBGroup.

This is the ANOVA table:

Type III Repeated Measures MANOVA Tests: Pillai test statistic
                                   Df test stat approx F num Df den Df    Pr(>F)    
(Intercept)                         1   0.99424   4830.5      1     28 < 2.2e-16 ***
Group                               1   0.01375      0.4      1     28   0.53715    
FactorA                             1   0.46649     24.5      1     28 3.197e-05 ***
Groups:FactorA                      1   0.00685      0.2      1     28   0.66367    
FactorB                             1   0.14451      4.7      1     28   0.03825 *  
Group:FactorB                       1   0.15108      5.0      1     28   0.03378 *  
FactorA:FactorB                     1   0.09930      3.1      1     28   0.08985 .  
Group:FactorA:FactorB               1   0.02737      0.8      1     28   0.38232    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As you can see, there is a significant GroupFactorB effect, but the interaction of my interest (GroupFactorA*FactorB) is not significant.

However, if I try to conduct a post-hoc analysis on the GroupFactorAFactorB:

my_data$Interaction <- interaction(my_data$FactorA, my_data$FactorB)
my_data$Interaction <- factor(my_data$Interaction)

post_hoc_model <- lme(value~interaction, random=~1|Subject/FactorB/FactorB,data=my_data)
planned_post_hoc <- summary(glht(post_hoc_model,linfct=mcp(interaction=c(
  "Control.Level1.Level1-Control.Level2.Level1==0",
  "Experimental.Level1.Level1-Experimental.Level2.Level1==0",
  "Control.Level1.Level2-Control.Level2.Level2==0",
  "Experimental.Level1.Level2-Experimental.Level2.Level2==0"))),
  test=adjusted("BY"))

I obtain these results:

Linear Hypotheses:
                                                            Estimate Std. Error z value Pr(>|z|)  
Control.Level1.Level1-Control.Level2.Level1==0              0.053125   0.019228   2.763   0.0477 *
Experimental.Level1.Level1-Experimental.Level2.Level1==0   -0.018750   0.027192  -0.690   1.0000  
Control.Level1.Level2-Control.Level2.Level2==0              0.003125   0.019228   0.163   1.0000  
Experimental.Level1.Level2-Experimental.Level2.Level2==0   -0.025000   0.027192  -0.919   1.0000  

Specifically, the first comparison is significant, after correcting for False Discovery Rate, and this is coherent with my hypothesis and with the simple inspection of the means.

enter image description here

Probably, the interaction was not significant due to my small sample sizes.

I know that, after a not-significant interaction, one does not have to do post-hoc for that. So I'm wondering how much this methodology is criticable by a referee. I thought to specify that the post-hoc comparisons should be interpretate with caution because the interaction was not significant. Moreover, I found this practice in a lot of researches.

Do you think this way of proceeding is totally wrong?

How can I eventually defend my analysis from a referee critique (maybe by referring to published statistical papers)?

Best Answer

As a reviewer there would be several things here that would concern me.

Assuming we were looking at the set of possible two-way interactions in your post-hocs (the next rational step in a decomposition from a three-way interaction), then a significant effect for one two-way interaction (but not for the others) would not necessitate a three way interaction per se. For example, one two-way interaction may have a statistically significant effect size greater than 0 and the others may have effects in the same direction, but not large enough to be greater than 0. Nevertheless, because all are going in the same direction, then there might not be sufficient evidence to suggest that they are sufficiently different from each other to reject the null hypothesis that they are the same (i.e., not a statistically significant three-way interaction).

That being said, I don't see your post-hocs here as testing the differences between two-way interactions (i.e. differences in the differences). You seem to be testing a subset of possible main effects (differences manipulating only a single variable while holding the levels of other variables fixed). For example, none of your comparisons involve both the Experimental and Control groups.

What does your result actually indicate? I think it indicates a statistically significant difference between those two particular conditions (Control, 1, 1 and Control, 2, 1).

Regardless, you should know that your lack of a three-way interaction here is probably not a power issue. If it were simply a power issue, then the F ratio for your three way interaction would exceed 1. As it is, there is less variance in the three-way interaction that would be expected on average if the null hypothesis were true.

Finally, assuming the comparisons you did perform were of interest, then I would expect to see the comparisons done as a priori... a planned post-hoc makes no sense to me. That being said, I also know some reviewers are very post-hoc correction happy. The most important part here is that I would want to see those results interpreted appropriately (and not alluded to as a three way interaction).

Edit: Oh, and I should acknowledge that I've seen plenty of people interpret significant results consistent with a desired interaction as being evidence in strong favor of the interaction. I've even seen this in top tier journals. That being said, I strongly recommend against it (then again, I have a particular problem with this misbehavior, c.f. https://stats.stackexchange.com/a/4572/196).