Solved – Number of false positives after Bonferroni or FDR

bonferronifalse-discovery-ratehypothesis testingmultiple-comparisonsp-value

Let's say I am doing 100,000 tests. If the tests are independent, at an alpha of 0.05, we would expect (on average) 5000 of these tests to be false positives.

The Bonferroni method would penalize alpha by the number of tests being conducted. So, for 100,000 tests, I would have a very small (corrected) alpha (0.05/100000) for each of the tests. FDR would control for the proportions of false positives. One would sort all the observed p values and then check if the p value is less than (rank number/100000)*0.05.

In both the above case, we get a corrected alpha value which we use for individual tests. However, what is the expected number of false positives that I would get at the end, after the correction?

(I believe I am making a rookie mistake in understanding the concept here. Would be glad if someone could point out what that is…)

Best Answer

First, a small correction:

"Let's say I am doing 100,000 tests. If the tests are independent, at an alpha of 0.05, we would expect (on average) 5000 of these tests to be false positives."

This is technically correct - but only if you assume that you don't have any false Null hypotheses (i.e. Null hypotheses that you should reject). If, for example, 50'000 of the tested Null hypotheses were actually false, then only 50'000 true Null hypotheses are left to turn up as false positives. In this case, you would expect 2500 false positive tests (in addition to ideally 50'000 true positive tests if we assume a power of 100%)

Then, some nomenclature:

  • The Bonferroni correction controls for the per-family error rate (PFER) which is the total number of type I errors you'd expect in the whole test family aka the battery of test (see Dunn 1961 [1] and Frane 2015 [2] for more details on the Bonferroni correction and its control of the PFER, respectively)
  • The FDR does not control for any error but instead is a measure for the expected false discovery proportion (FDP), where the FDP is the proportion of falsely rejected hypotheses. Hence: $FDP = \begin{cases} \frac{N_{1|0}}{R} & \text{if } R \geq 0 \\ 0 & \text{if } R = 0 \end{cases}$ with R being the total number of rejected hypotheses. And: $FDR = E(FDP) = E(\frac{N_{1|0}}{R}|R>0)*Pr(R>0)$

As opposed to the PFER, the FDP also takes into account the fact that a high number of false rejections is less problematic if the total number of rejected hypotheses is rather high as well. I.e. if you have a lot of false hypotheses in your set of hypotheses (i.e. hypotheses you should reject), then you can allow for more falsely rejected Null hypotheses.

The problem with the FDP is that you cannot control for it. This is where the $FDR$ comes into play. Benjamini & Hochberg 1995 [3] showed that $FDR \leq \frac{M_0}{m}*\alpha \leq \alpha$, where $M_0$ is the total number of true Null hypotheses and m is the total number of hypothesis (true + false Null hypotheses). Hence, the $FDR$ is controllable at the pre-defined $\alpha$ level.

Finally, to your question regarding the expected number of false positives after corrections:

As you mentioned in your question, the Bonferroni and the Benjamini-Hochberg methods are correcting for the fact that uncorrected multiple testing leads to a lot of false positives. So the correction methods make sure that the error rates we have discussed here (PFER and FDR, respectively) remains at the pre-defined significance level (e.g. 0.05) for the whole family aka test battery.

When using the Bonferroni correction, you control for the PFER, i.e. the total number of type I errors expected to occur in your test battery. This is just the sum of probabilities of Type I error for all the hypotheses in the test battery after applying the Bonferroni correction. In other words, if you repeatedly conduct a test battery of 100'000 tests with Bonferroni correction for a PFER of 0.05 (and assuming that all tested hypotheses are true) you would on average expect one falsely recjected Null hypothesis in 5% of these test batteries. For example, if you conduct 100 test batteries using the Bonferroni correction for a PFER of 0.05, you'd expect 5 of these test batteries to give you a falsely rejected Null hypothesis on average (for the sake of the argument we still assume that all of your Null hypotheses are correct, i.e. ideally should not be rejected)

For the Benjamini-Holm (BH) correction, it's a bit more complicated. Because the BH method corrects for the FDR (and not the PFER), it also takes into account the total number of rejected hypotheses. If you have a lot of false Null hypotheses (i.e. hypotheses you should reject) in your set of hypotheses, the BH allows for more falsely rejected Null hypothese (remember: $FDR = E(FDP) = E(\frac{N_{1|0}}{R}|R>0)*Pr(R>0)$). So you could actually end up with a higher rate of type I errors than with the Bonferroni method while the FDR still remains below your pre-defined significance level. However, if we again assume that all of the tested hypotheses in your test battery are correct, the control of the FDR also controls the PFER. Hence, if you repeatedly conduct a test battery, you should again on average expect one falsely recjected Null hypothesis in 5% of them.

[1] Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56, 52–64. 70

[2] Frane, Andrew V. (2015) "Are Per-Family Type I Error Rates Relevant in Social and Behavioral Science?," Journal of Modern Applied Statistical Methods: Vol. 14: Iss. 1, Article 5.

[3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300. 69, 72