Solved – Correcting for multiple comparisons after multiple ANOVAs

anovabonferronimultiple-comparisonsstatistical significance

I've had to run 5 different ANOVAs (identical 4-way mixed design ANOVAs including the same factors, but on a different dependent variable [a gait parameter] each time). In this case, what would be the best way to correct my p-values? Let's say I'm using Bonferroni (for example)… am I dividing my alpha by 5 since that's how many ANOVAs I've done?

There are other analyses that have been done. For example an additional ANOVA was run to compare walking pace in various conditions to the target walking pace (again, a new dependent variable but still including the factors from the previous ANOVAs). So I suppose I should be correcting for 6 tests. Then, of course, to tease apart interactions in the various ANOVAs I have t-tests… do they come into the overall adjustment?

I think I'm getting lost in all the numbers, and would really appreciate some input on how to handle multiple comparison corrections in this situation. I'm not certain how clear I am, so I can provide more detail if needed.

Best Answer

Not to worry too much. Now, are the results being interpreted serially or not, i.e., is this or that or the next thing or the next etc. significant? If, for example, we have 6 tests and they are all significant, then we are not saying that a single one of them determines that the whole series of tests is significant, it just is not that situation.

So what is Bonferroni? Bland-Altman explain it thus "If we test a null hypothesis which is in fact true, using 0.05 as the critical significance level, we have a probability of 0.95 of coming to a not significant—that is, correct—conclusion. If we test two independent true null hypotheses, the probability that neither test will be significant is 0.95x0.95=0.90. If we test 20 such hypotheses the probability that none will be significant is $0.95^{20}=0.36$. This gives a probability of 1–0.36=0.64 of getting at least one significant result—we are more likely to get one than not. The expected number of spurious significant results is 20x0.05=1. In general, if we have ($\kappa$) independent significant tests at the ($\alpha$) level of null hypotheses which are all true, the probability that we will get no significant differences is $(1-\alpha)^{\kappa}$. If we make ($\alpha$) small enough we can make the probability that none of the separate tests is significant equal to 0.95. Then if any of the ($\kappa$) tests has a $P$-value less than ($\alpha$) we will have a significant difference between the treatments at the 0.05 level. Since $\alpha$ will be very small, it can be shown that $(1-\alpha)^{\kappa} \approx 1-\kappa \alpha$. If we put $\kappa \alpha=0.05$, so $\alpha=\frac{0.05}{\kappa}$, we will have probability 0.05 that one of the $\kappa$ tests will have a $P$ value less than $\alpha$ if the null hypotheses are true. Thus, if in a clinical trial we compare two treatments within five subsets of patients the treatments will be significantly different at the 0.05 level if there is a P value less than 0.01 within any of the subsets. This is the Bonferroni method. Note that they are not significant at the 0.01 level, but at only the 0.05 level."

On the other hand, if your situation is a MANOVA one, as @DavidLane suggests, then you should use that first, as it will lump all the data into a single test of significance and be more specific to the lumped significance than a Bonferroni correction of multiple serial tests. Your question "to tease apart interactions in the various ANOVAs I have t-tests... do they come into the overall adjustment?" That could be done after MANOVA to first see if the ensemble is significant.