Solved – Is Bonferroni but not Tukey HSD a method of correction for pairwise comparison

anovamixed modelmultiple-comparisonspost-hoc

What is the relationship between Bonferroni and Tukey HSD, are they just two methods of conducting corrections for post-hoc tests? If this is the case, could I say "post-hoc testing was conducted using Tukey-HSD adjustment/correction"?

Or is it the case that Tukey HSD is a procedure of post-hoc testing and Bonferroni is a correction method? If this is the case, could they combine with each other?

Best Answer

The main difference is that the Bonferroni correction can be applied to any set of independent $p$-values whereas the Tukey HSD method only applies to pair wise comparisons between means. Both of them control the family-wise error rate. The Wikipedia article on that subject compares them and offers some advice about ones which you have not considered. Note that if you decide to use Bonferroni there is a better method with the same assumptions due to Holm. This is also sometimes known as the Holm-Bonferroni procedure.

Related Solutions

Multiple Comparisons – Meaning of Post Hoc Tests like Tukey HSD

As always, your question implicitly asks for some authoritative answer that might very well not exist. Scheffé's method and Tukey's HSD are usually called post-hoc tests, used for unplanned comparisons and conducted after an omnibus test but that's not a requirement for all such methods.

The main argument for a distinction between planned and unplanned tests is that if you always intended to conduct a limited number of tests (planned contrasts), you don't necessarily need to adjust the error level. If, on the other hand, you are just reporting/testing the differences that look big (post-hoc tests), you might be “capitalizing on chance” and you should adjust not only of the tests you conduct/report but for all possible pairwise comparisons/contrasts in your design.

One issue with all this is that it makes the evaluation of the evidence and the result of a study contingent on the intentions of the experimenter, a most counter-intuitive and undesirable state of affairs. This is sometimes held as an argument against null-hypothesis significance testing as used within psychology.

Solved – consensus on adjusting alpha for multiple contrasts if the main effect is significant

The idea that only non-orthogonal comparisons require adjustment is a myth. See section 6.1 of Frane (2015): http://jrp.icaap.org/index.php/jrp/article/view/514/417
In general, computing several alternate statistics and picking the one that gives you the answer you like best is a bad policy and can cause error inflation (as it's a form of multiple comparisons in itself). It's best to have a statistical plan before you look at your data.
Bonferroni is less powerful than Holm. Holm is less powerful than some other procedures that require more assumptions. Sidak is only a tiny bit more powerful than Bonferroni and requires the assumption of non-negative dependence. If you just want to compare each treatment to control, and not compare the different treatments to each other, you can use Dunnett's procedure (which is designed for that purpose).
Not sure what you mean by "post hoc." Unfortunately, different people use that term in different ways.
Multiplicity applies any time you conduct more than one comparison.
See 5.
If you're not interested in the omnibus result, there's no reason to perform the omnibus test. As you observed, you can just go straight to the individual tests, adjusted for multiplicity (though it may be advisable to use the omnibus error term for those tests, which can provide more power in some cases). Some people perform the omnibus test and then use Fisher's LSD method (i.e. do the individual comparisons without adjustment), but that doesn't generally control the familywise error rate and may thus be hard to justify.
I don't see why the significance of a main effect should inherently affect whether you adjust the other tests.

Response to @Sophocole's reply from Aug 5, 2016 to @Bonferroni's answer from Aug 3, 2016.

I don't know who you talked to at IBM, but SPSS has several ways to control the familywise error rate, including Bonferroni, Tukey, and Dunnett tests (just google "multiple comparisons in SPSS" and you'll see). The same goes for any other reputable statistical package, including SAS and R. And if you're using a simple method like Bonferroni, you can probably do the adjustment in your head.

Regarding doing multiple tests of a single comparison and choosing the one that gives you the answer you like best, it's pretty straightforward to see what the problem with that is. If you try one method that produces error at a rate of 5%, but then you get a second, third, and fourth chance with alternative methods, obviously the error rate is going to be bigger than 5%. That's like playing darts and setting up a second, third, and fourth bull's eye in slightly different positions on the dart board--obviously, you're increasing your chances of getting lucky.

If you're in a very early stage of your research where you're just exploring around and error rates aren't a big concern, then by all means, test your heart out and don't bother with adjustments--you could even just look at the plots and mean differences and not do any formal testing at all if that suits your needs. But if you're trying to publish a claim or sell a treatment based on your results, you likely need statistical rigor. And if you're trying to get a drug approved by the FDA, you can forget about playing loose with error control!

By the way, you may want to read that Nakagawa article again. It seems he is not arguing against "getting rid of multiplicity adjustments altogether." He apparently thinks Bonferroni and Holm are generally too conservative for behavioral ecology research, but he does endorse false discovery rate control.

Best Answer

Related Solutions

Multiple Comparisons – Meaning of Post Hoc Tests like Tukey HSD

Solved – consensus on adjusting alpha for multiple contrasts if the main effect is significant

Related Question