Solved – Adjusting a subset of P values to correct for multiple testing

adjustmentfalse-discovery-ratemultiple-comparisonsp-valuestatistical significance

I have a list of 300 $p$ values selected from all $p$ values obtained in multiple testing such that $p<.05$. I do not know the values of $p>.05$ or their number. The $p$ values are not corrected for multiple testing, and I would like to do so for the 300 raw $p$ values that I have.

Is it OK to apply the Benjamini-Hochberg correction method, for instance with p.adjust() in R, on my list of $p$ values even though they have been pre-selected for being less than $.05$? If not, is there another method that can be used?

Best Answer

No, that is not ok. The FDR looks at the entire distribution of all your p-values to make the correction. If it does not see the right side of the distribution it will give much too optimistic estimates.

Options to still use Benjamini-Hochberg:

  1. Replace all the missing p-values with 1. This would be conservative.
  2. Use multiple imputation where you repeatedly impute the missing p-values with values between 0.05 and 1. This would also be acceptable, since conditional on the p-value being larger than 0.05 and the null hypothesis being true for all of them (conservative assumption), this would be the actual distribution.

An alternative is a less powerful method like Bonferroni or Bonferroni-Holm. In both cases as in the method described above you will still need to know the number of missing p-values - or at least an upper bound for that. Note that the $n$ in the correction formulas would be the total number of p-values (your 300 + the number of p-values that are missing).