Solved – An intuitive explanation why the Benjamini-Hochberg FDR procedure works

false-discovery-rateintuitionteaching

Is there a simple way of explaining why does Benjamini and Hochberg's (1995) procedure actually control the false discovery rate (FDR)? This procedure is so elegant and compact and yet the proof of why it works under independence (appearing in the appendix of their 1995 paper) is not very accessible.

Best Answer

Here is some R-code to generate a picture. It will show 15 simulated p-values plotted against their order. So they form an ascending point pattern. The points below the red/purple lines represent significant tests at the 0.1 or 0.2 level. The FDR ist the number of black points below the line divided by the total number of points below the line.

x0 <- runif(10)      #p-values of 10 true null hypotheses. They are Unif[0,1] distributed.
x1 <- rbeta(5,2,30)  # 5 false hypotheses, rather small p-values
xx <- c(x1,x0)
plot(sort(xx))
a0 <- sort(xx)
for (i in 1:length(x0)){a0[a0==x0[i]] <- NA}
points(a0,col="red")
points(c(1,15), c(1/15 * 0.1 ,0.1), type="l", col="red")
points(c(1,15), c(1/15 * 0.2 ,0.2), type="l", col="purple")

I hope this might give some feeling about the shape the distribution of ordered p-values has. That the lines are correct and not e.g. some parable-shaped curve, has to do with the shape of the order distributions. This has to be calculated explicitly. In fact, the line is just a conservative solution.