First, you need to understand that these two multiple testing procedures do not control the same thing. Using your example, we have two groups with 18,000 observed variables, and you make 18,000 tests in order to identify some variables which are different from one group to the other.
Bonferroni correction controls the Familywise error rate, that is the probability, assuming all the 18,000 variables have identical distribution in the two groups, that you are falsely claiming "here I have some significant differences". Usually, you decide that if this probability is < 5%, your claim is credible.
Benjamini-Hochberg correction controls the False discovery rate, that is, the expected proportion of false positives among the variables for which you claim the existence of a difference. For example, if with FDR controlled to 5% 20 tests are positive, "in average" only 1 of these tests will be a false positive.
Now, when the number of comparison increases... well, it depends on the number of marginal null hypotheses that are true. But basically, with both procedures, if you have a few, let’s says 5 or 10, truly associated variables, you have more chances to detect them among 100 variables than among 1,000,000 variables. That should be intuitive enough. There’s no way to avoid this.
It so happens that by coincidence I read this same paper just a couple of weeks ago. Colquhoun mentions multiple comparisons (including Benjamini-Hochberg) in section 4 when posing the problem, but I find that he does not make the issue clear enough -- so I am not surprised to see your confusion.
The important point to realize is that Colquhoun is talking about the situation without any multiple comparison adjustments. One can understand Colquhoun's paper as adopting a reader's perspective: he essentially asks what false discovery rate (FDR) can he expect when he reads scientific literature, and this means what is the expected FDR when no multiple comparison adjustments were done.
Multiple comparisons can be taken into account when running multiple statistical tests in one study, e.g. in one paper. But nobody ever adjusts for multiple comparisons across papers.
If you actually control FDR, e.g. by following Benjamini-Hochberg (BH) procedure, then it will be controlled. The problem is that running BH procedure separately in each study, does not guarantee overall FDR control.
Can I safely assume that in the long run, if I do such analysis on a regular basis, the FDR is not $30\%$, but below $5\%$, because I used Benjamini-Hochberg?
No. If you use BH procedure in every paper, but independently in each of your papers, then you can essentially interpret your BH-adjusted $p$-values as normal $p$-values, and what Colquhoun says still applies.
General remarks
The answer to Colquhoun's question about the expected FDR is difficult to give because it depends on various assumptions. If e.g. all the null hypotheses are true, then FDR will be $100\%$ (i.e. all "significant" findings would be statistical flukes). And if all nulls are in reality false, then FDR will be zero. So the FDR depends on the proportion of true nulls, and this is something that has be externally estimated or guessed, in order to estimate the FDR. Colquhoun gives some arguments in favor of the $30\%$ number, but this estimate is highly sensitive to the assumptions.
I think the paper is mostly reasonable, but I dislike that it makes some claims sound way too bold. E.g. the first sentence of the abstract is:
If you use $p=0.05$ to suggest that you have made a discovery, you will be wrong at least $30\%$ of the time.
This is formulated too strongly and can actually be misleading.
Best Answer
No, that is not ok. The FDR looks at the entire distribution of all your p-values to make the correction. If it does not see the right side of the distribution it will give much too optimistic estimates.
Options to still use Benjamini-Hochberg:
An alternative is a less powerful method like Bonferroni or Bonferroni-Holm. In both cases as in the method described above you will still need to know the number of missing p-values - or at least an upper bound for that. Note that the $n$ in the correction formulas would be the total number of p-values (your 300 + the number of p-values that are missing).