Is there a simple way of explaining why does Benjamini and Hochberg's (1995) procedure actually control the false discovery rate (FDR)? This procedure is so elegant and compact and yet the proof of why it works under independence (appearing in the appendix of their 1995 paper) is not very accessible.
Solved – An intuitive explanation why the Benjamini-Hochberg FDR procedure works
false-discovery-rateintuitionteaching
Related Solutions
Benjamini and Hochberg (1995) introduced the false discovery rate. Benjamini and Yekutieli (2001) proved that the estimator is valid under some forms of dependence. Dependence can arise as follows. Consider the continuous variable used in a t-test and another variable correlated with it; for example, testing if BMI differs in two groups and if waist circumference differs in these two groups. Because these variables are correlated, the resulting p-values will also be correlated. Yekutieli and Benjamini (1999) developed another FDR controlling procedure, which can be used under general dependence by resampling the null distribution. Because the comparison is with respect to the null permutation distribution, as the total number of true positives increases, the method becomes more conservative. It turns out that BH 1995 is also conservative as the number of true positives increases. To improve this, Benjamini and Hochberg (2000) introduced the adaptive FDR procedure. This required estimation of a parameter, the null proportion, which is also used in Storey's pFDR estimator. Storey gives comparisons and argues that his method is more powerful and emphasizes the conservative nature of 1995 procedure. Storey also has results and simulations under dependence.
All of the above tests are valid under independence. The question is what kind of departure from independence can these estimates deal with.
My current thinking is that if you don't expect too many true positives the BY (1999) procedure is nice because it incorporates distributional features and dependence. However, I'm unaware of an implementation. Storey's method was designed for many true positives with some dependence. BH 1995 offers an alternative to the family-wise error rate and it is still conservative.
Benjamini, Y and Y Hochberg. On the Adaptive Control of the False Discovery Rate in Multiple Testing with Independent Statistics. Journal of Educational and Behavioral Statistics, 2000.
Two points
Point the First
If you are using something like R's p.adjust()
to calculate $q$ values, then the 1 values simply indicate not rejected at any level of FDR. $q$-values are actually a little problematic to interpret directly, since they have a subtle mathematical artifice and because they do not communicate the step-wise nature of the FDR adjustment process (and one cannot make FDR rejection decisions based on $q$-values alone). Backing up to a single two-sided hypothesis can help illustrate why:
Reject $H_{0}$ if $p \le \alpha/2$. So for $\alpha = 0.05$, we would reject $H_{0}$ if $p \le 0.025$. Alternately, we could express this same rejection criterion as reject $H_{0}$ if $2p \le \alpha$. The first expression perhaps emphasizes the meaning of $p$, and the second emphasizes the meaning of $\alpha$.
If we think about the Bonferroni method (FWER, not FDR), we can see that we have two way to express the rejection criterion given $m$ number of comparisons:
Reject $H_{0}$ if $p \le \frac{\alpha/2}{m}$, or
Reject $H_{0}$ if $2mp \le \alpha$.
That $2mp$ is an 'adjusted $p$-value', sometimes called a '$q$-value'.
(I suppose there's also a third way: reject $H_{0}$ if $mp \le \alpha/2$.)
But look: $2mp$ is $>1$ when $p>.5/m$, which is quite possible. Unfortunately $p$ (or $q$) is supposed to be a probability which means that it's value is strictly bounded by zero and one inclusive. So many folks, and many statistical software authors will take an expression like $q = mp$, and replace it with $q=\max(1,mp)$. The same applies to FDR (whether using the Benjamini-Hochberg or Benjamini-Yekutieli method)... the adjustments are more complicated than the Bonferroni, but they cap the $q$-value results at 1.
In a way, I suspect that this implies that expressing such adjustments as adjustments of rejection levels, rather than adjustments to $p$-values is a little more coherent, because the artifice of $\max(1,f(p,i))$ does not apply.
Point the Second
We can't tell for sure because you have not provided, for example, your vector of $p$-values, but the likelihood is that your $p$-values are all too high, and that you are not achieving significance.
Best Answer
Here is some
R
-code to generate a picture. It will show 15 simulated p-values plotted against their order. So they form an ascending point pattern. The points below the red/purple lines represent significant tests at the 0.1 or 0.2 level. The FDR ist the number of black points below the line divided by the total number of points below the line.I hope this might give some feeling about the shape the distribution of ordered p-values has. That the lines are correct and not e.g. some parable-shaped curve, has to do with the shape of the order distributions. This has to be calculated explicitly. In fact, the line is just a conservative solution.