Solved – How and when to use the Bonferroni adjustment

bonferronimultiple-comparisonstype-i-and-ii-errors

I have two questions regarding when to use a Bonferroni adjustment:

  • Is it appropriate to use a Bonferroni adjustment in all cases of multiple testing?
  • If one performs a test on a data set, then one splits that data set into finer levels (e.g. split the data by gender) and performs the same tests, how might this affect the number of individual tests that are perceived? That is, if X hypotheses are tested on a dataset containing data from both males and females and then the dataset is split to give male and female data separately and the same hypotheses tested, would the number of individual hypotheses remain as X or increase due to the additional testing?

Thank you for your comments.

Best Answer

The Bonferroni adjustment will always provide strong control of the family-wise error rate. This means that, whatever the nature and number of the tests, or the relationships between them, if their assumptions are met, it will ensure that the probability of having even one erroneous significant result among all tests is at most $\alpha$, your original error level. It is therefore always available.

Whether it is appropriate to use it (as opposed to another method or perhaps no adjustment at all) depends on your objectives, the standards of your discipline and the availability of better methods for your specific situation. At the very least, you should probably consider the Holm-Bonferroni method, which is just as general but less conservative.

Regarding your example, since you are performing several tests, you are increasing the family-wise error rate (the probability of rejecting at least one null hypothesis erroneously). If you only perform one test on each half, many adjustments would be possible including Hommel's method or methods controlling the false discovery rate (which is different from the family-wise error rate). If you conduct a test on the whole data set followed by several sub-tests, the tests are no longer independent so some methods are no longer appropriate. As I said before, Bonferroni is in any case always available and guaranteed to work as advertised (but also to be very conservative…).

You could also just ignore the whole issue. Formally, the family-wise error rate is higher but with only two tests it's still not so bad. You could also start with a test on the whole data set, treated as the main outcome, followed by sub-tests for different groups, uncorrected because they are understood as secondary outcomes or ancillary hypotheses.

If you consider many demographic variables in that way (as opposed to just planning to test for gender differences from the get go or perhaps a more systematic modeling approach), the problem becomes more serious with a significant risk of “data dredging” (one difference comes out significant by chance allowing you to rescue an inconclusive experiment with some nice story about the demographic variable to boot whereas in fact nothing really happened) and you should definitely consider some form of adjustment for multiple testing. The logic remains the same with X different hypotheses (testing X hypotheses twice – one on each half of the data set – entails a higher family-wise error rate than testing X hypotheses only once and you should probably adjust for that).

Related Question