Solved – What’s wrong with Bonferroni adjustments

bonferronihypothesis testingmultiple-comparisons

I read the following paper: Perneger (1998) What's wrong with Bonferroni adjustments.

The author summarized by saying that Bonferroni adjustment have, at best, limited applications in biomedical research and should not be used when assessing evidence about specific hypothesis:

Summary points:

  • Adjusting statistical significance for the number of tests that have been performed on study data—the Bonferroni method—creates more problems than it solves
  • The Bonferroni method is concerned with the general null hypothesis (that all null hypotheses are true simultaneously), which is rarely of interest or use to researchers
  • The main weakness is that the interpretation of a finding depends on the number of other tests performed
  • The likelihood of type II errors is also increased, so that truly important differences are deemed non-significant
  • Simply describing what tests of significance have been performed, and why, is generally the best way of dealing with multiple comparisons

I have the following data set and I want to do multiple testing correction BUT I am unable to decide for the best method in this case.

enter image description here

I want to know if it is imperative to do this kind of correction for all the data sets that contain lists of means and what is the best method for the correction in this case?

Best Answer

What is wrong with the Bonferroni correction besides the conservatism mentioned by others is what's wrong with all multiplicity corrections. They do not follow from basic statistical principles and are arbitrary; there is no unique solution to the multiplicity problem in the frequentist world. Secondly, multiplicity adjustments are based on the underlying philosophy that the veracity of one statement depends on which other hypotheses are entertained. This is equivalent to a Bayesian setup where the prior distribution for a parameter of interest keeps getting more conservative as other parameters are considered. This does not seem to be coherent. One could say that this approach comes from researchers having been "burned" by a history of false positive experiments and now they want to make up for their misdeeds.

To expand a bit, consider the following situation. An oncology researcher has made a career of studying efficacy of chemotherapies of a certain class. All previous 20 of her randomized trials have resulted in statistically insignificant efficacy. Now she is testing a new chemotherapy in the same class. The survival benefit is significant with $P=0.04$. A colleague points out that there was a second endpoint studied (tumor shrinkage) and that a multiplicity adjustment needs to be applied to the survival result, making for an insignificant survival benefit. How is it that the colleague emphasized the second endpoint but couldn't care less about adjusting for the 20 previous failed attempts to find an effective drug? And how would you take into account prior knowledge about the 20 previous studies if you weren't Bayesian? What if there had been no second endpoint. Would the colleague believe that a survival benefit had been demonstrated, ignoring all previous knowledge?

Related Question