The Bayesian test for your question is based on the integrated (rather than maximised) likelihood. So for Poisson we have:
$$\begin{array}{c|c}
H_{1}:\lambda_{1}=\lambda_{2} & H_{2}:\lambda_{1}\neq\lambda_{2}
\end{array}
$$
Now neither hypothesis says what the parameters are, so the actual values are nuisance parameters to be integrated out with respect to their prior probabilities.
$$P(H_{1}|D,I)=P(H_{1}|I)\frac{P(D|H_{1},I)}{P(D|I)}$$
The model likelihood is given by:
$$P(D|H_{1},I)=\int_{0}^{\infty} P(D,\lambda|H_{1},I)d\lambda=\int_{0}^{\infty} P(\lambda|H_{1},I)P(D|\lambda,H_{1},I)\,d\lambda$$
$$=\int_{0}^{\infty} P(\lambda|H_{1},I)\frac{\lambda^{x_1+x_2}\exp(-2\lambda)}{\Gamma(x_1+1)\Gamma(x_2+1)}\,d\lambda$$
where $P(\lambda|H_{1},I)$ is the prior for lambda. A convenient mathematical choice is the gamma prior, which gives:
$$P(D|H_{1},I)=\int_{0}^{\infty} \frac{\beta^{\alpha}}{\Gamma(\alpha)}\lambda^{\alpha-1}\exp(-\beta \lambda)\frac{\lambda^{x_1+x_2}exp(-2\lambda)}{\Gamma(x_1+1)\Gamma(x_2+1)}\,d\lambda$$
$$=\frac{\beta^{\alpha}\Gamma(x_1+x_2+\alpha)}{(2+\beta)^{x_1+x_2+\alpha}\Gamma(\alpha)\Gamma(x_1+1)\Gamma(x_2+1)}$$
And for the alternative hypothesis we have:
$$P(D|H_{2},I)=\frac{\beta_{1}^{\alpha_{1}}\beta_{2}^{\alpha_{2}}\Gamma(x_1+\alpha_{1})\Gamma(x_2+\alpha_{2})}{(1+\beta_{1})^{x_1+\alpha_{1}}(1+\beta_{2})^{x_2+\alpha_{2}}\Gamma(\alpha_{1})\Gamma(\alpha_{2})\Gamma(x_1+1)\Gamma(x_2+1)}$$
Now if we assume that all hyper-parameters are equal (not an unreasonable assumption, given that you are testing for equality), then we have an integrated likelihood ratio of:
$$\frac{P(D|H_{1},I)}{P(D|H_{2},I)}=
\frac{(1+\beta)^{x_1+x_2+2\alpha}\Gamma(x_1+x_2+\alpha)\Gamma(\alpha)}
{(2+\beta)^{x_1+x_2+\alpha}\beta^{\alpha}\Gamma(x_1+\alpha)\Gamma(x_2+\alpha)}
$$
Which you can see that the prior information is still very important. We can't set $\alpha$ or $\beta$ equal to zero (Jeffrey's prior), or else $H_{1}$ will always be favored, regardless of the data. One way to get values for them is to specify prior estimates for $E[\lambda]$ and $E[\log(\lambda)]$ and solve for the parameters - this cannot be based on $x_1$ or $x_2$ but can be based on any other relevant information. You can also put in a few different (reasonable) values for the parameters and see what difference it makes to the conclusion. The numerical value of this statistic tells you how much the data and your prior information about the rates in each hypothesis support the hypothesis of equal rates. This explains why the likelihood ratio test is not always reliable - because it essentially ignores prior information, which is usually equivalent to specifying Jeffrey's prior. Note that you could also specify upper and lower limits for the rate parameters (this is usually not too hard to do given some common sense thinking about the real world problem). Then you would use a prior of the form:
$$p(\lambda|I)=\frac{I(L<\lambda<U)}{\log\left(\frac{U}{L}\right)\lambda}$$
And you would be left with a similar equation to that above but in terms of incomplete, instead of complete gamma functions.
For the binomial case things are much simpler, because the non-informative prior (uniform) is proper. The procedure is similar to that above, and the integrated likelihood for $H_{1}:p_{1}=p_{2}$ is given by:
$$P(D|H_{1},I)={n_1 \choose x_1}{n_2 \choose x_2}\int_{0}^{1}p^{x_1+x_2}(1-p)^{n_1+n_2-x_1-x_2}\,dp$$
$$={n_1 \choose x_1}{n_2 \choose x_2}B(x_1+x_2+1,n_1+n_2-x_1-x_2+1)$$
And similarly for $H_{2}:p_{1}\neq p_{2}$
$$P(D|H_{2},I)={n_1 \choose x_1}{n_2 \choose x_2}\int_{0}^{1}p_{1}^{x_1}p_{2}^{x_2}(1-p_{1})^{n_1-x_1}(1-p_{2})^{n_{2}-x_{2}}\,dp_{1}\,dp_{2}$$
$$={n_1 \choose x_1}{n_2 \choose x_2}B(x_1+1,n_1-x_1+1)B(x_2+1,n_2-x_2+1)$$
And so taking ratios gives:
$$\frac{P(D|H_{1},I)}{P(D|H_{2},I)}=
\frac{B(x_1+x_2+1,n_1+n_2-x_1-x_2+1)}
{B(x_1+1,n_1-x_1+1)B(x_2+1,n_2-x_2+1)}
$$
$$=\frac{{x_1+x_2 \choose x_1}{n_1+n_2-x_1-x_2 \choose n_1-x_1}(n_1+1)(n_2+1)}{{n_1+n_2 \choose n_1}(n_1+n_2+1)}$$
And the choose functions can be calculated using the hypergeometric($r$,$n$,$R$,$N$) distribution where $N=n_1+n_2$, $R=x_1+x_2$, $n=n_1$, $r=x_1$
And this tells you how much the data support the hypothesis of equal probabilities, given that you don't have much information about which particular value this may be.
I believe the papers, articles, posts e.t.c. that you diligently gathered, contain enough information and analysis as to where and why the two approaches differ. But being different does not mean being incompatible.
The problem with the "hybrid" is that it is a hybrid and not a synthesis, and this is why it is treated by many as a hybris, if you excuse the word-play.
Not being a synthesis, it does not attempt to combine the differences of the two approaches, and either create one unified and internally consistent approach, or keep both approaches in the scientific arsenal as complementary alternatives, in order to deal more effectively with the very complex world we try to analyze through Statistics (thankfully, this last thing is what appears to be happening with the other great civil war of the field, the frequentist-bayesian one).
The dissatisfaction with it I believe comes from the fact that it has indeed created misunderstandings in applying the statistical tools and interpreting the statistical results, mainly by scientists that are not statisticians, misunderstandings that can have possibly very serious and damaging effects (thinking about the field of medicine helps giving the issue its appropriate dramatic tone). This misapplication, is I believe, accepted widely as a fact-and in that sense, the "anti-hybrid" point of view can be considered as widespread (at least due to the consequences it had, if not for its methodological issues).
I see the evolution of the matter so far as a historical accident (but I don't have a $p$-value or a rejection region for my hypothesis), due to the unfortunate battle between the founders. Fisher and Neyman/Pearson have fought bitterly and publicly for decades over their approaches. This created the impression that here is a dichotomous matter: the one approach must be "right", and the other must be "wrong".
The hybrid emerged, I believe, out of the realization that no such easy answer existed, and that there were real-world phenomena to which the one approach is better suited than the other (see this post for such an example, according to me at least, where the Fisherian approach seems more suitable). But instead of keeping the two "separate and ready to act", they were rather superfluously patched together.
I offer a source which summarizes this "complementary alternative" approach:
Spanos, A. (1999). Probability theory and statistical inference: econometric modeling with observational data. Cambridge University Press., ch. 14, especially Section 14.5, where after presenting formally and distinctly the two approaches, the author is in a position to point to their differences clearly, and also argue that they can be seen as complementary alternatives.
Best Answer
The p-value of a hypothesis test or a corresponding confidence interval depends on the treatment or choice of 2 issues:
1. Treatment of nuisance parameter
To preserve the size at the exact level, the type 1 error needs to be less than or equal to alpha for all possible values of the nuisance parameter. The null hypothesis that the rates are equal, does not constrain the value itself, so it is a nuisance parameter.
Conditional tests like the exact conditional Poisson test or Fisher's exact test remove the nuisance parameter by conditioning on a summary statistic.
Unconditional exact test need to assert that the size is correct by using the max or sup over all possible values of the nuisance parameter. Berger-Boos test limits the space of nuisance parameter for the
max
but adds a factor to make it exact, i.e. preserves the size alpha.The Poisson E-test is not an exact test in this sense. It uses the "exact" distribution but it uses the estimated value of the nuisance parameter.
2. Location of two-sided rejection region
A two-sided test has rejection region in both lower and upper tail. The requirement that the sixe of the test is at most alpha, is a requirement on the probability to be in one of the tails, but it does not pin down the probability in each tail separately.
"central" or equal tail methods limit the probability of the each tail to be less than or equal to half the size, alpha / 2.
"minlike" uses the likelihood value (based on likelihood ratio test) to find the non-rejection region. The corresponding profile confidence interval will not have equal tails in skewed distributions like Poisson or Binomial.
One point that Michael P. Fay points out and emphasizes is the hypothesis tests and confidence interval are often not consistent with each other.
For example the exact poisson test in R uses "minlike" hypothesis test and exact pvalue, but reports exact "central" or equal-tail confidence intervals.
In one-sided test, the location is fixed and this distinction between "minlike" and "central" becomes irrelevant. Because there is only one tail, an exact test needs to preserve the size for that tail at level alpha.