I believe the papers, articles, posts e.t.c. that you diligently gathered, contain enough information and analysis as to where and why the two approaches differ. But being different does not mean being incompatible.
The problem with the "hybrid" is that it is a hybrid and not a synthesis, and this is why it is treated by many as a hybris, if you excuse the word-play.
Not being a synthesis, it does not attempt to combine the differences of the two approaches, and either create one unified and internally consistent approach, or keep both approaches in the scientific arsenal as complementary alternatives, in order to deal more effectively with the very complex world we try to analyze through Statistics (thankfully, this last thing is what appears to be happening with the other great civil war of the field, the frequentist-bayesian one).
The dissatisfaction with it I believe comes from the fact that it has indeed created misunderstandings in applying the statistical tools and interpreting the statistical results, mainly by scientists that are not statisticians, misunderstandings that can have possibly very serious and damaging effects (thinking about the field of medicine helps giving the issue its appropriate dramatic tone). This misapplication, is I believe, accepted widely as a fact-and in that sense, the "anti-hybrid" point of view can be considered as widespread (at least due to the consequences it had, if not for its methodological issues).
I see the evolution of the matter so far as a historical accident (but I don't have a $p$-value or a rejection region for my hypothesis), due to the unfortunate battle between the founders. Fisher and Neyman/Pearson have fought bitterly and publicly for decades over their approaches. This created the impression that here is a dichotomous matter: the one approach must be "right", and the other must be "wrong".
The hybrid emerged, I believe, out of the realization that no such easy answer existed, and that there were real-world phenomena to which the one approach is better suited than the other (see this post for such an example, according to me at least, where the Fisherian approach seems more suitable). But instead of keeping the two "separate and ready to act", they were rather superfluously patched together.
I offer a source which summarizes this "complementary alternative" approach:
Spanos, A. (1999). Probability theory and statistical inference: econometric modeling with observational data. Cambridge University Press., ch. 14, especially Section 14.5, where after presenting formally and distinctly the two approaches, the author is in a position to point to their differences clearly, and also argue that they can be seen as complementary alternatives.
It so happens that by coincidence I read this same paper just a couple of weeks ago. Colquhoun mentions multiple comparisons (including Benjamini-Hochberg) in section 4 when posing the problem, but I find that he does not make the issue clear enough -- so I am not surprised to see your confusion.
The important point to realize is that Colquhoun is talking about the situation without any multiple comparison adjustments. One can understand Colquhoun's paper as adopting a reader's perspective: he essentially asks what false discovery rate (FDR) can he expect when he reads scientific literature, and this means what is the expected FDR when no multiple comparison adjustments were done.
Multiple comparisons can be taken into account when running multiple statistical tests in one study, e.g. in one paper. But nobody ever adjusts for multiple comparisons across papers.
If you actually control FDR, e.g. by following Benjamini-Hochberg (BH) procedure, then it will be controlled. The problem is that running BH procedure separately in each study, does not guarantee overall FDR control.
Can I safely assume that in the long run, if I do such analysis on a regular basis, the FDR is not $30\%$, but below $5\%$, because I used Benjamini-Hochberg?
No. If you use BH procedure in every paper, but independently in each of your papers, then you can essentially interpret your BH-adjusted $p$-values as normal $p$-values, and what Colquhoun says still applies.
General remarks
The answer to Colquhoun's question about the expected FDR is difficult to give because it depends on various assumptions. If e.g. all the null hypotheses are true, then FDR will be $100\%$ (i.e. all "significant" findings would be statistical flukes). And if all nulls are in reality false, then FDR will be zero. So the FDR depends on the proportion of true nulls, and this is something that has be externally estimated or guessed, in order to estimate the FDR. Colquhoun gives some arguments in favor of the $30\%$ number, but this estimate is highly sensitive to the assumptions.
I think the paper is mostly reasonable, but I dislike that it makes some claims sound way too bold. E.g. the first sentence of the abstract is:
If you use $p=0.05$ to suggest that you have made a discovery, you will be wrong at least $30\%$ of the time.
This is formulated too strongly and can actually be misleading.
Best Answer
Your false discovery rate not only depends on the p-value threshold, but also on the truth. In fact, if your null hypothesis is in reality wrong it is impossible for you to make a false discovery.
Maybe it's helpful to think of it like that: the p-value threshold is the probability of making false discoveries when there are no true discoveries to be make (or to put it differently, if the null hypothesis is true).
Basically,
Type 1 Error Rate = "Probability of rejecting the null if it's true" = p-value threshold
and
Type 1 Error Rate = False Discovery Rate IF the null hypothesis is true
is correct, but note the conditional on the true null. The false discovery rate does not have this conditional and thereby depends on the unknown truth of how many of your null hypotheses are actually correct or not.
It's also worthwhile to consider that when you control the false discovery rate using a procedure like Benjamini-Hochberg you are never able to estimate the actually false discovery rate, instead you control it by estimating an upper bound. To do more you would actually need to be able to detect that the null hypothesis is true using statistics, when you can only detect violations of a certain magnitude (depending on the power of your test).