First, a small correction:
"Let's say I am doing 100,000 tests. If the tests are independent, at an alpha of 0.05, we would expect (on average) 5000 of these tests to be false positives."
This is technically correct - but only if you assume that you don't have any false Null hypotheses (i.e. Null hypotheses that you should reject). If, for example, 50'000 of the tested Null hypotheses were actually false, then only 50'000 true Null hypotheses are left to turn up as false positives. In this case, you would expect 2500 false positive tests (in addition to ideally 50'000 true positive tests if we assume a power of 100%)
Then, some nomenclature:
- The Bonferroni correction controls for the per-family error rate (PFER) which is the total number of type I errors you'd expect in the whole test family aka the battery of test (see Dunn 1961 [1] and Frane 2015 [2] for more details on the Bonferroni correction and its control of the PFER, respectively)
- The FDR does not control for any error but instead is a measure for the expected false discovery proportion (FDP), where the FDP is the proportion of falsely rejected hypotheses. Hence:
$FDP = \begin{cases}
\frac{N_{1|0}}{R} & \text{if } R \geq 0 \\
0 & \text{if } R = 0
\end{cases}$
with R being the total number of rejected hypotheses.
And: $FDR = E(FDP) = E(\frac{N_{1|0}}{R}|R>0)*Pr(R>0)$
As opposed to the PFER, the FDP also takes into account the fact that a high number of false rejections is less problematic if the total number of rejected hypotheses is rather high as well. I.e. if you have a lot of false hypotheses in your set of hypotheses (i.e. hypotheses you should reject), then you can allow for more falsely rejected Null hypotheses.
The problem with the FDP is that you cannot control for it. This is where the $FDR$ comes into play. Benjamini & Hochberg 1995 [3] showed that $FDR \leq \frac{M_0}{m}*\alpha \leq \alpha$, where $M_0$ is the total number of true Null hypotheses and m is the total number of hypothesis (true + false Null hypotheses). Hence, the $FDR$ is controllable at the pre-defined $\alpha$ level.
Finally, to your question regarding the expected number of false positives after corrections:
As you mentioned in your question, the Bonferroni and the Benjamini-Hochberg methods are correcting for the fact that uncorrected multiple testing leads to a lot of false positives. So the correction methods make sure that the error rates we have discussed here (PFER and FDR, respectively) remains at the pre-defined significance level (e.g. 0.05) for the whole family aka test battery.
When using the Bonferroni correction, you control for the PFER, i.e. the total number of type I errors expected to occur in your test battery. This is just the sum of probabilities of Type I error for all the hypotheses in the test battery after applying the Bonferroni correction.
In other words, if you repeatedly conduct a test battery of 100'000 tests with Bonferroni correction for a PFER of 0.05 (and assuming that all tested hypotheses are true) you would on average expect one falsely recjected Null hypothesis in 5% of these test batteries. For example, if you conduct 100 test batteries using the Bonferroni correction for a PFER of 0.05, you'd expect 5 of these test batteries to give you a falsely rejected Null hypothesis on average (for the sake of the argument we still assume that all of your Null hypotheses are correct, i.e. ideally should not be rejected)
For the Benjamini-Holm (BH) correction, it's a bit more complicated. Because the BH method corrects for the FDR (and not the PFER), it also takes into account the total number of rejected hypotheses. If you have a lot of false Null hypotheses (i.e. hypotheses you should reject) in your set of hypotheses, the BH allows for more falsely rejected Null hypothese (remember: $FDR = E(FDP) = E(\frac{N_{1|0}}{R}|R>0)*Pr(R>0)$).
So you could actually end up with a higher rate of type I errors than with the Bonferroni method while the FDR still remains below your pre-defined significance level. However, if we again assume that all of the tested hypotheses in your test battery are correct, the control of the FDR also controls the PFER. Hence, if you repeatedly conduct a test battery, you should again on average expect one falsely recjected Null hypothesis in 5% of them.
[1] Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56, 52–64. 70
[2] Frane, Andrew V. (2015) "Are Per-Family Type I Error Rates Relevant in Social and Behavioral Science?," Journal of Modern Applied Statistical Methods: Vol. 14: Iss. 1, Article 5.
[3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300. 69, 72
The idea behind these corrections is that the the chance of getting a false positive for several test say 20 at an $\alpha=0.05$ is 1 in 20. Out of 20 comparisons one has a strong possibility of being a false positive.
"none" says don't correct the test and use $\alpha = 0.05$.
The Bonferroni correction is the most conservative measure of the three you mentioned where the adjusted significance level is $\alpha/m$ where $m$ is the number of hypotheses.
FDR or False Discovery Rate correction in R is an alias for the 'BH' Benjamini & Hochberg correction. This method ranks the unadjusted p-values and the new pvalue is less than or equal to $\alpha * rank / m$ where $m$ is the number of hypotheses.
As a rule of thumb, in terms of a false positive I would order them as follows (lowest to highest):
bfr $<$ fdr $<$ none
You can do more research into each and learn about the family wise error rate vs the false discovery rate.
Question and answer on Stackexchange about the bonferroni correction vs Benjamini-Hochberg correction as the number of comparisons increase
Best Answer
Your formula for the p-values appears correct, assuming p1 is the t-value.
I don't see any reason why you would use both the Benjamini-Hochberg correction to control false discovery rate (FDR) and the Bonferroni correction to control the familywise error rate (FWER). You would choose one approach or the other.
Corrections for multiple p-values can be handled in R with the
p.adjust
function.When using this function, the decision rule remains p < alpha [not p < alpha / n]. That is, R adjusts the p-values for you so that you don't need to adjust the decision rule.
The following code in R calculates the p-value for 7 genes, then uses either BH or Bonferroni correction. The S columns in the data frame indicate whether the p-value is < 0.05.
You'll note that Bonferroni is more conservative than BH. I think that Bonferroni is too conservative for most situations. It is helpful to read up on the various FDR and FWER control methods.