The validity of the BH procedure depends on the hypothesis tests being positively dependent. If you read their 2001 paper you would see that it is not necessary to be multivariate normal, they gave weak conditions in the paper:
Rosenbaum’s (1984) conditional (positive) association, is enough to
imply PRDS: $X$ is conditionally associated, if for any partition $(X1,$
$X2)$ of $X$, and any function $h(X1), X2$ given $h(X1)$ is positively
associated.
If these seems like a reasonable assumption to make about your data, then just declare it as an assumption and try to come up with scenarios where it is and isn't met to clarify it to yourself.
It so happens that by coincidence I read this same paper just a couple of weeks ago. Colquhoun mentions multiple comparisons (including Benjamini-Hochberg) in section 4 when posing the problem, but I find that he does not make the issue clear enough -- so I am not surprised to see your confusion.
The important point to realize is that Colquhoun is talking about the situation without any multiple comparison adjustments. One can understand Colquhoun's paper as adopting a reader's perspective: he essentially asks what false discovery rate (FDR) can he expect when he reads scientific literature, and this means what is the expected FDR when no multiple comparison adjustments were done.
Multiple comparisons can be taken into account when running multiple statistical tests in one study, e.g. in one paper. But nobody ever adjusts for multiple comparisons across papers.
If you actually control FDR, e.g. by following Benjamini-Hochberg (BH) procedure, then it will be controlled. The problem is that running BH procedure separately in each study, does not guarantee overall FDR control.
Can I safely assume that in the long run, if I do such analysis on a regular basis, the FDR is not $30\%$, but below $5\%$, because I used Benjamini-Hochberg?
No. If you use BH procedure in every paper, but independently in each of your papers, then you can essentially interpret your BH-adjusted $p$-values as normal $p$-values, and what Colquhoun says still applies.
General remarks
The answer to Colquhoun's question about the expected FDR is difficult to give because it depends on various assumptions. If e.g. all the null hypotheses are true, then FDR will be $100\%$ (i.e. all "significant" findings would be statistical flukes). And if all nulls are in reality false, then FDR will be zero. So the FDR depends on the proportion of true nulls, and this is something that has be externally estimated or guessed, in order to estimate the FDR. Colquhoun gives some arguments in favor of the $30\%$ number, but this estimate is highly sensitive to the assumptions.
I think the paper is mostly reasonable, but I dislike that it makes some claims sound way too bold. E.g. the first sentence of the abstract is:
If you use $p=0.05$ to suggest that you have made a discovery, you will be wrong at least $30\%$ of the time.
This is formulated too strongly and can actually be misleading.
Best Answer
FDRs are commonly much greater than p-values, and recall the lowest FDR value represents a list of features and not one feature, like a p-value does. So if the lowest value of FDR is for example 0.15 for a list of 15 features(genes), then at the very least you have to publish the list and state that the FDR is 0.15. If reviewers expect to see lower FDR's then there's not much of an alternative. Only lower p-values can drive FDR lower. If you can generate a list of 10-15 or 20-30 features whose FDR is 0.05, then you should have no problem publishing this in the peer-reviewed literature. However, FDR values of 0.1, 0.15, and 0.2 and greater are frowned upon -- i.e. are not considered "good." This does not mean however, that you can't publish a list of features when FDR is not 0.05 or is greater than 0.05 -- it's a matter of the personal-experience-based FDR threshold assumed by the reviewer, laboratory, or journal.