FDR correction stronger than Bonferroni

bonferronifalse-discovery-ratehypothesis testingp-valuestatistical significance

Is it possible to get a situation where, in a multiple-testing scenario, for some individual tests, the Benjamini&Yukatieli (2001)-correction ends up being more stringent than Bonferroni? I.e. that for some tests,
$$
p_{\text{Bonferroni}} < p_{BY}
$$

I seem to have encountered this with data and while I have no conceptual problem with it (the procedures are after all quite different), but would have intuitively said that with every correction any p-value can be at most as large as with Bonferroni.

Best Answer

It turns out that indeed, this can and does trivially happen, as tested on two different implementations in Python, from statsmodels and pingouin, with only two tests and p-values of 0.01 and 0.1.

from pingouin import multicomp as pingouin_multicomp
from statsmodels.stats.multitest import multipletests as statsmodels_multicomp
pvals = [0.01, 0.1]

With this,

pingouin_multicomp(pvals, method="bonf")[1] -> array([0.02, 0.2 ])
pingouin_multicomp(pvals, method="fdr_by")[1] -> array([0.03, 0.15])
statsmodels_multicomp(pvals, method="fdr_by")[1] -> array([0.03, 0.15])

Unless there is some deep conceptual mistake I made, it would appear that the smaller value gets corrected more.

Interestingly, this does not seem to happen with the original Benjamini & Hochberg (1995) procedure (as suggested by the method outlined in the post linked by Christoph Hanck). If anyone wants to outline the difference, I'll shift the accept; for now this is at least what seems to be confirmation that I'm not crazy.