Solved – Is it necessary to correct alpha in repeated measures ANOVA before any post-hoc comparisons

multiple-comparisonsp-valuerepeated measures

I have a cross-over design where a sample of subjects undergoes 7 different interventions, and within each intervention there are 3 time points (pre-intervention, post-intervention1, post-intervention2).

I firstly did an overall ANOVA with factors time x intervention, to check whether there were overall differences between interventions at any time point tested.

For the within-intervention analysis, I first did a repeated measures ANOVA (alpha = .05) with factor time (3), with post-hoc multiple comparisons (pairwise t-tests) Bonferroni corrected.

However, during review in a peer-reviewed journal, a reviewer claimed that my "time" effect for a certain target intervention was not significant (p = 0.028; < 0.05) because it wasn't corrected for multiple comparisons (which I only used in the post-hoc comparison).

Is the reviewer's point valid? Do I need to correct alpha in the RM ANOVA? I've been trying to find a relevant explanation in "the interwebs", but I can't find anything conclusive. So far, the best I've come upon is this extract of a 2001 paper by Bender & Lange:

"Methods to adjust for multiple testing in studies collecting repeated measurements are rare. Despite much recent work on mixed models [38,39] with random subject effects to allow for correlation of data, there are only few multiple comparison procedures for special situations. It is difficult to develop a general adjustment method for multiple comparisons in the case of repeated measurements since these comparisons occur for between-subject factors (e.g., groups), within-subject factors (e.g., time), or both. The specific correlation structure has to be taken into account, involving many difficulties. If only comparisons for between-subject factors are of interest, one possibility is to consider the repeated measurements as multiple endpoints and use one of the methods mentioned in the previous section. However, if the repeated measurements are ordered, this information is lost by using such an approach."

Best Answer

This is a closed testing procedure, so you must correct the p-values to control type-one error within levels of the hypothesis hierarchy. for example, in a normal ANOVA, you test the global null first at the .05 level. Then, if and only if the global test is significant, you move on to the pair-wise comparisons, which must have a joint alpha level of .05. So if you used Bonferroni and you had 3 groups, you would test each of the 3 pair-wise comparisons at the .05/3 level. This maintains the overall alpha level, but it is crucial that you only proceed to test a hypothesis if every "more general" hypothesis has been rejected--every hypothesis which includes the current one as a subset. In the simple ANOVA case, the comparison $\mu_1 = \mu_2$ can only be considered if you have already rejected $\mu_1=\mu_2=\mu_3$, because the latter is $\{\mu_1 =\mu_2\} \bigcap \{\mu_2 = \mu_3\}$. If you go ahead and test everything regardless, you are going to have to be much more conservative and apply the Bonferroni procedure with $k$ equal to the total number of tests performed.

It sounds like the reviewer is complaining that you didn't maintain your alpha level for the hypothesis strata of "time affects intervention $l$", $l=1, \ldots,7$. You can do a Bonferroni adjustment for this (multiply all of these p-values by 7) but since the tests are probably correlated that is likely to be extremely conservative. A popular alternative is the Benjamini Hochberg step-down procedure. Don't worry that no one has written about it for repeated-measures ANOVA, it is applicable because it applies to all sequential analyses of correlated hypotheses.

Edit: In case I wasn't clear, to maintain your alpha level, only perform pair-wise comparisons when you have rejected the hypothesis that time does not affect intervention $j$ (in a way that maintains your alpha level across all such hypotheses).

Related Question