Solved – Bonferonni versus FDR

adjustmentbonferronifalse-discovery-ratep-value

My question is rather simple: When should you use the Bonferroni correction over the FDR correction?

I've read about that the Bonferroni correction applies a more conservative approach in comparison with the FDR correction. Additional, the FDR should be more accurate with more pvalues (how much is more?).

I applied both p adjustment methods and can clearly see that the Bonferroni is more conservative (table and figure). But how would I know which method to use?

In my case I have 20 pvalues derived from 20 separate glm's (binomial):

     Original    FDR           Bonferroni 
X1   0.00516     0.0121      0.1033 
X2   0.00000     0.0000      0.0000 
X3   0.00128     0.0051      0.0255 
X4   0.01730     0.0316      0.3461 
X5   0.00545     0.0121      0.1091 
X6   0.75503     0.7550      1.0000 
X7   0.54320     0.6036      1.0000 
X8   0.31668     0.4222      1.0000 
X9   0.68161     0.7175      1.0000 
X10  0.01737     0.0316      0.3474 
X11  0.02543     0.0391      0.5086 
X12  0.02055     0.0343      0.4110 
X13  0.04737    0.06767      0.9474 
X14  0.00063     0.0042      0.0126 
X15  0.00109     0.0051      0.0217 
X16  0.37707     0.4713      1.0000 
X17  0.00046     0.0042      0.0092 
X18  0.00191     0.0064      0.0381 
X19  0.00402     0.0115      0.0805 
X20  0.47011     0.5531      1.0000

As you can see the Bonferroni is as expected conservative as it leaves most of the pvalues above 0.05. When looking at the correlations between the significant X variables and the response variable (for both adjustment methods) an ecological meaningful explanation can be formulated. However, just looking at the pvalues is there a way to say which method is more appropriate?

Best Answer

Some procedures control the familywise type I error rate (e.g. Bonferroni, the uniformly more powerful Bonferroni-Holm adjustment, the more flexible general class of closed testing procedure, for specific situations like multiple comparisons against a common control Dunnett's test etc.) i.e. the probability of making at least one false rejection of a null hypothesis. In general, Bonferroni should probably be avoided in favor of e.g. Bonferroni-Holm (or something else that is closely tailored to the specific problem). False discovery rate (FDR) controlling procedures (e.g. Benjamini-Hochberg) instead control the proportion of wrongly rejected null hypotheses amongst those that are rejected (instead of amongst all).

Thus, the difference between the two types of approaches is about the goal and desired error control. E.g. in settings when there is a huge number of signals that need to be screened based on not very much data to determine whether something should be explored further, FDR control may be the most sensible. However, the results should then not be interpreted, as if a strict familywise type I error control had been applied. In contrast, for deciding what claims a drug company can do about a drug based on a clinical registration trial, strict familywise type I error rate is more usual.

Related Solutions

Solved – Number of false positives after Bonferroni or FDR

First, a small correction:

"Let's say I am doing 100,000 tests. If the tests are independent, at an alpha of 0.05, we would expect (on average) 5000 of these tests to be false positives."

This is technically correct - but only if you assume that you don't have any false Null hypotheses (i.e. Null hypotheses that you should reject). If, for example, 50'000 of the tested Null hypotheses were actually false, then only 50'000 true Null hypotheses are left to turn up as false positives. In this case, you would expect 2500 false positive tests (in addition to ideally 50'000 true positive tests if we assume a power of 100%)

Then, some nomenclature:

The Bonferroni correction controls for the per-family error rate (PFER) which is the total number of type I errors you'd expect in the whole test family aka the battery of test (see Dunn 1961 [1] and Frane 2015 [2] for more details on the Bonferroni correction and its control of the PFER, respectively)
The FDR does not control for any error but instead is a measure for the expected false discovery proportion (FDP), where the FDP is the proportion of falsely rejected hypotheses. Hence: $FDP = \begin{cases} \frac{N_{1|0}}{R} & \text{if } R \geq 0 \\ 0 & \text{if } R = 0 \end{cases}$ with R being the total number of rejected hypotheses. And: $FDR = E(FDP) = E(\frac{N_{1|0}}{R}|R>0)*Pr(R>0)$

As opposed to the PFER, the FDP also takes into account the fact that a high number of false rejections is less problematic if the total number of rejected hypotheses is rather high as well. I.e. if you have a lot of false hypotheses in your set of hypotheses (i.e. hypotheses you should reject), then you can allow for more falsely rejected Null hypotheses.

The problem with the FDP is that you cannot control for it. This is where the $FDR$ comes into play. Benjamini & Hochberg 1995 [3] showed that $FDR \leq \frac{M_0}{m}*\alpha \leq \alpha$, where $M_0$ is the total number of true Null hypotheses and m is the total number of hypothesis (true + false Null hypotheses). Hence, the $FDR$ is controllable at the pre-defined $\alpha$ level.

Finally, to your question regarding the expected number of false positives after corrections:

As you mentioned in your question, the Bonferroni and the Benjamini-Hochberg methods are correcting for the fact that uncorrected multiple testing leads to a lot of false positives. So the correction methods make sure that the error rates we have discussed here (PFER and FDR, respectively) remains at the pre-defined significance level (e.g. 0.05) for the whole family aka test battery.

When using the Bonferroni correction, you control for the PFER, i.e. the total number of type I errors expected to occur in your test battery. This is just the sum of probabilities of Type I error for all the hypotheses in the test battery after applying the Bonferroni correction. In other words, if you repeatedly conduct a test battery of 100'000 tests with Bonferroni correction for a PFER of 0.05 (and assuming that all tested hypotheses are true) you would on average expect one falsely recjected Null hypothesis in 5% of these test batteries. For example, if you conduct 100 test batteries using the Bonferroni correction for a PFER of 0.05, you'd expect 5 of these test batteries to give you a falsely rejected Null hypothesis on average (for the sake of the argument we still assume that all of your Null hypotheses are correct, i.e. ideally should not be rejected)

For the Benjamini-Holm (BH) correction, it's a bit more complicated. Because the BH method corrects for the FDR (and not the PFER), it also takes into account the total number of rejected hypotheses. If you have a lot of false Null hypotheses (i.e. hypotheses you should reject) in your set of hypotheses, the BH allows for more falsely rejected Null hypothese (remember: $FDR = E(FDP) = E(\frac{N_{1|0}}{R}|R>0)*Pr(R>0)$). So you could actually end up with a higher rate of type I errors than with the Bonferroni method while the FDR still remains below your pre-defined significance level. However, if we again assume that all of the tested hypotheses in your test battery are correct, the control of the FDR also controls the PFER. Hence, if you repeatedly conduct a test battery, you should again on average expect one falsely recjected Null hypothesis in 5% of them.

[1] Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56, 52–64. 70

[2] Frane, Andrew V. (2015) "Are Per-Family Type I Error Rates Relevant in Social and Behavioral Science?," Journal of Modern Applied Statistical Methods: Vol. 14: Iss. 1, Article 5.

[3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300. 69, 72

Solved – Over-represented values in FDR-adjusted p-values

There is nothing wrong with these q-values. Correcting p-values (corrected p-values are often referred to as q-values) is a concept that's newer than the Benjamini-Hochberg (BH) procedure, which in its purest form only outputs significant: yes or no. To understand why your q-values look the way they look we need to look at how the BH step-up procedure works. The steps below are adapted from Wasserman's All of Statistics:

Order your p-values $p_1 < p_2 < \ldots < p_m$.
Define $l_i = \frac{i\alpha}{m}$, where $\alpha$ is your desired false discovery rate. Also define $T = \max\{p_i: p_i < l_i\}$, ie $T$ is the largest p-value for which $p_i < l_i$.
Let $T$ be your BH rejection threshold.
Reject the null-hypotheses $H_{0i}$ for which $p_i \leq T$.

To avoid having to choose an $\alpha,$ the q-value turns this procedure upside-down and asks what is the smallest $\alpha$ at which $H_{0i}$ would be rejected? The threshold $T$ is always going to be one of your $m$ p-values, so the threshold for acceptance as a function of the FDR $\alpha$ is going to look like some stairs:

library(plyr)
ps <- c(0.019, 0.022, 0.023, 0.023, 0.025, 0.025, 0.027, 0.028, 0.029, 0.030, 0.030, 0.030, 0.031, 0.033, 0.034, 0.035, 0.036, 0.037, 0.037, 0.039, 0.051, 0.060, 0.063, 0.065, 0.085, 0.110, 0.170, 0.196, 0.241, 0.316, 0.318, 0.325, 0.694)

# calculate T as function of alpha
BHT <- function(alpha) {
  l <- 1:length(ps)*alpha/length(ps)
  i <- sum(ps < l)
  if (i == 0) return(i)

  ps[i]   # threshold
}

alphas <- seq(0,1,by=.001)
Ts <- aaply(alphas, 1, BHT)
plot(alphas, Ts, type="l")

For each threshold, there is a single smallest $\alpha$ that makes this the new rejection threshold. If we put some lines in the previous plot for your different p-values, you'll see that many of them sit on the same step of the staircase:

plot(alphas, Ts, type="n")
abline(v=ps, col="grey")
lines(alphas, Ts)

These will get rejected at the same threshold and correspond to the same smallest $\alpha$. This $\alpha$ is your q-value, so it's OK and pretty much inevitable to get repeated q-values.

Best Answer

Related Solutions

Solved – Number of false positives after Bonferroni or FDR

Solved – Over-represented values in FDR-adjusted p-values

Related Question