Solved – Benjamini-Hochberg dependency assumptions justified

chi-squared-testfalse-discovery-ratekruskal-wallis test”likelihood-ratiomultiple-comparisons

I have a data set where I test for significant differences between three populations with respect to some 50 different variables. I do this using Kruskal-Wallis tests, on one hand, and by likelihood ratio tests of nested GLM model fits (with and without population as an independent variable), on the other.

As a result, I have a list of Kruskal-Wallis $p$-values on one hand, and what I think are Chi square $p$-values from the LRT comparisons, on the other.

I need to do some form of multiple testing correction since there are >50 different tests, and Benjamini-Hochberg FDR seems like it is the most sensible choice.

However, the variables are probably not independent, with several "clans" of them being correlated. The question is then: how can I tell if the set of underlying statistics for my $p$-values meet the requirements of positive dependence that are needed for the Benjamini-Hochberg procedure to still be bound to the FDR?

The Benjamini-Hochberg-Yekutieli paper from 2001 states that the PRDS condition holds for multivariate normal and studentized distribution. What about my likelihood ratio test Chi square values for the model comparison? What about the $p$-values I have for the Kruskal-Wallis tests?

I can use the Benjamini-Hochberg-Yekutieli worst-case FDR correction that assumes nothing on the dependency, but I think it may be too conservative in this case and miss some relevant signals.

Best Answer

The validity of the BH procedure depends on the hypothesis tests being positively dependent. If you read their 2001 paper you would see that it is not necessary to be multivariate normal, they gave weak conditions in the paper:

Rosenbaum’s (1984) conditional (positive) association, is enough to imply PRDS: $X$ is conditionally associated, if for any partition $(X1,$ $X2)$ of $X$, and any function $h(X1), X2$ given $h(X1)$ is positively associated.

If these seems like a reasonable assumption to make about your data, then just declare it as an assumption and try to come up with scenarios where it is and isn't met to clarify it to yourself.