Solved – Is it permissible to run post-hoc tests after a non-significant ANOVA

anovapost-hocstatistical significance

We ran a 3-way ANOVA getting a non-significant result, but in doing so we saw that one condition looked like it had a strong effect. We tried a Bonferroni-corrected t-test of just that group, and it turned out significant. Is it valid to consider this? Is there some other correction needed to deal with the fact that we tested that group only after we saw that it "looked" like an effect?

(And does the answer to this sort of question depend on between vs. within factors? We have a mix. I ask this because SPSS doesn't seem to give the option for post hoc tests under the GLM procedure except for between-factor effects. Not the best way to learn, but it's a clue, nonetheless.)

Best Answer

It depends on what you plan to do with the result.

If you want to form a conclusion regarding a null hypothesis and have a specifiable rate of false positives (i.e. you want a Neyman-Pearson hypothesis test) then no, you can't do it. Spuriously 'significant' results turn up all the time and any result that looks significant will quite likely turn out to be statistically significant even when it is not real. Neyman-Pearson analysis allows you to test PREDEFINED hypotheses with PREDETERMINED analyses.

If you want to use your data to help form hypotheses to test with new experiments (i.e. use a Fisherian approach) then yes, go ahead and test! The fact that you cannot reliably test an hypothesis that is formed on the basis of seeing a dataset using that same dataset does not mean that the dataset cannot point to something worthy of a follow-up experiment.

Related Solutions

Solved – Post hoc test in a 2×3 mixed design ANOVA using SPSS

Answer edited to implement encouraging and constructive comment by @Ferdi

I would like to:

provide an answer with a full contained script
mention one can also test more general custom contrasts using the /TEST command
argue this is necessary in some cases (ie the EMMEANS COMPARE combination is not enough)

I assume to have a database with columns: depV, Group, F1, F2. I implement a 2x2x2 mixed design ANOVA where depV is the dependent variable, F1 and F2 are within subject factors and Group is a between subject factor. I further assume the F test has revealed that the interaction Group*F2 is significant. I therefore need to use post hoc t-tests to understand what drives the interaction.

MIXED depV BY Group F1 F2 
  /FIXED=Group F1 F2 Group*F1 Group*F2 F1*F2 Group*F1*F2 |  SSTYPE(3) 
  /METHOD=REML 
  /RANDOM=INTERCEPT | SUBJECT(Subject) COVTYPE(VC) 
  /EMMEANS=TABLES(Group*F2) COMPARE(Group) ADJ(Bonferroni)
  /TEST(0) = 'depV(F2=1)-depV(F2=0) differs between groups' 
    Group*F2 1/4 -1/4 -1/4 1/4 
    Group*F1*F2 1/8 -1/8 1/8 -1/8 -1/8 1/8 -1/8 1/8 
  /TEST(0) = 'depV(Group1, F2=1)-depV(Group2, F2=1)' Group 1 -1
    Group*F1 1/2 1/2 -1/2 -1/2 
    Group*F2 1 0 -1 0  
    Group*F1*F2 1/2 0 1/2 0 -1/2 0 -1/2 0 .

In particular the second t-test corresponds to the one performed by the EMMEANS command. The EMMEANS comparison could reveal for example that depV was bigger in Group 1 on the condition F2=1.

However the interaction could also be driven by something else, which is verified by the first test: the difference depV(F2=1)-depV(F2=0) differs between groups, and this is a contrast you cannot verify with the EMMEANS command (at least I did not find an easy way).

Now, in models with many factors it is a bit tricky to write down the /TEST line, the sequence of 1/2, 1/4 etc, called L matrix. Typically if you get the error message: "the L matrix is not estimable", you are forgetting some elements. One link that explains the receipt is this one: https://stats.idre.ucla.edu/spss/faq/how-can-i-test-contrasts-and-interaction-contrasts-in-a-mixed-model/

Solved – Doing post-hoc after a not significant interaction in mixed ANOVA

As a reviewer there would be several things here that would concern me.

Assuming we were looking at the set of possible two-way interactions in your post-hocs (the next rational step in a decomposition from a three-way interaction), then a significant effect for one two-way interaction (but not for the others) would not necessitate a three way interaction per se. For example, one two-way interaction may have a statistically significant effect size greater than 0 and the others may have effects in the same direction, but not large enough to be greater than 0. Nevertheless, because all are going in the same direction, then there might not be sufficient evidence to suggest that they are sufficiently different from each other to reject the null hypothesis that they are the same (i.e., not a statistically significant three-way interaction).

That being said, I don't see your post-hocs here as testing the differences between two-way interactions (i.e. differences in the differences). You seem to be testing a subset of possible main effects (differences manipulating only a single variable while holding the levels of other variables fixed). For example, none of your comparisons involve both the Experimental and Control groups.

What does your result actually indicate? I think it indicates a statistically significant difference between those two particular conditions (Control, 1, 1 and Control, 2, 1).

Regardless, you should know that your lack of a three-way interaction here is probably not a power issue. If it were simply a power issue, then the F ratio for your three way interaction would exceed 1. As it is, there is less variance in the three-way interaction that would be expected on average if the null hypothesis were true.

Finally, assuming the comparisons you did perform were of interest, then I would expect to see the comparisons done as a priori... a planned post-hoc makes no sense to me. That being said, I also know some reviewers are very post-hoc correction happy. The most important part here is that I would want to see those results interpreted appropriately (and not alluded to as a three way interaction).

Edit: Oh, and I should acknowledge that I've seen plenty of people interpret significant results consistent with a desired interaction as being evidence in strong favor of the interaction. I've even seen this in top tier journals. That being said, I strongly recommend against it (then again, I have a particular problem with this misbehavior, c.f. https://stats.stackexchange.com/a/4572/196).

Best Answer

Related Solutions

Solved – Post hoc test in a 2×3 mixed design ANOVA using SPSS

Solved – Doing post-hoc after a not significant interaction in mixed ANOVA

Related Question