You are not getting the correct result because you have two problems: first, you need to be clear about what type of parametrization you are using for the gamma distribution. Specifically, if the parametrization is shape $\alpha$ and rate $\beta$, then the gamma density is $$f_X(x) = \frac{\beta^\alpha x^{\alpha - 1} e^{-\beta x}}{\Gamma(\alpha)}.$$ If the parametrization is shape $\alpha$ and scale $\beta$, then $$f_X(x) = \frac{x^{\alpha - 1} e^{-x/\beta}}{\beta^\alpha \Gamma(\alpha)}.$$ You should not confuse the two, because in each case the conjugacy looks different.
The second problem is that your hierarchical model specification is reversed in your Mathematica code. The model is
$$X \sim \operatorname{Gamma}(\alpha, \beta) \\ Y \mid X \sim \operatorname{Poisson}(X).$$ So $X$ is the unobserved prior and $Y$ is the distribution of the observed data. Then the marginal distribution is $Y \sim \operatorname{NegativeBinomial}(?, ?)$ and the posterior is $X \mid Y \sim \operatorname{Gamma}(?, ?)$. So in Mathematica, the syntax for a shape-scale parametrization should be:
f = FullSimplify[PDF[PoissonDistribution[x], y]
PDF[GammaDistribution[a, b], x] /
PDF[NegativeBinomialDistribution[a, 1/(1+b)], y],
{x > 0, y >= 0, a > 0, b > 0}]
g = FullSimplify[PDF[GammaDistribution[a + y, b/(1+b)], x],
{x > 0, y >= 0, a > 0, b > 0}]
And then f/g
should evaluate to 1
symbolically. Note the two errors: first, you've mixed up x
and y
in various places in your code, and second, because Mathematica consistently uses the shape-scale parametrization, the posterior scale hyperparameter is not $1 + \beta$ but rather $\beta/(1 + \beta)$. In other words, you are using inconsistent parametrizations on each side of your equation; on the left, a shape-rate gamma, and on the right, a shape-scale gamma.
The same code in a shape-rate parametrization would look like this:
f = FullSimplify[PDF[PoissonDistribution[x], y]
PDF[GammaDistribution[a, 1/b], x] /
PDF[NegativeBinomialDistribution[a, b/(1+b)], y],
{x > 0, y > 0, a > 0, b > 0}]
g = FullSimplify[PDF[GammaDistribution[a + y, 1/(1+b)], x],
{x > 0, y > 0, a > 0, b > 0}]
in which case, the posterior rate hyperparameter is $1 + \beta$ as you claim on the LHS of your equation, but then the negative binomial parametrization needs to be adjusted. Specifically, when we say $Y \sim \operatorname{NegativeBinomial}(r, p)$, we could mean
$$\Pr[Y = y] = \binom{y + r - 1}{r - 1} p^r (1-p)^y, \quad y \in \{0, 1, 2, \ldots\},$$ or we could mean
$$\Pr[Y = y] = \binom{y + r - 1}{r - 1} (1-p)^r p^y, \quad y \in \{0, 1, 2, \ldots\}.$$ Mathematica uses the first one. If however you use the second one, then the formula you wrote becomes consistent in parametrizations and $\beta$ is a rate parameter.
This is why it is important to:
Always specify the parametrization by stating explicitly the density and mass functions for the model when there is any possibility of ambiguity.
Check the documentation for Mathematica to see what parametrization is used for the built-in functions.
Let $U$ be uniformly distributed on $[0,1]$ and let $\bar X:= Q(U)$.
Composing by $Q\circ F$ on both sides, it holds that :
$$ Q(F(\bar X)) = Q(F(Q(U))\;\;\text{ almost surely}$$
To make it slightly easier to read, I rewrite it as follows :
$$ Q(F(\bar X)) = Q(F\circ Q(U)) \;\;\text{ almost surely} \tag1$$
Now, by definition of $Q$, it holds that for all $p\in\mathbb R$, $F\circ Q(p)\ge p$. Applying $Q$ (which is non-decreasing) on both sides gives $Q(F\circ Q(p))\ge Q(p) $ for all $p$, from which we deduce that :
$$ Q(F\circ Q(U)) \ge Q(U) = \bar X\;\;\text{ almost surely} \tag A$$
Now, again by definition of $Q$, we have that for all $p\in\mathbb R$, $Q(F(p)) \le p $, which implies that : $$Q(F(\bar X)) \le \bar X\;\;\text{ almost surely} \tag B$$
From $(A)$ and $(B)$ we deduce that $$Q(F(\bar X)) =\bar X\;\;\text{ almost surely} $$
To conclude, note that $\bar X$ and $X$ have the same distribution (I let you do the proof), so for any set $S$, it holds that $\mathbb P(X\in S) = \mathbb P(\bar X\in S)$. In particular if we consider the set $S:=\{x\in \mathbb R \ |\ Q(F(x)) =x\}$, we have that
$$ \mathbb P(\{Q(F(X) = X\}) =\mathbb P(X\in S) = \mathbb P(\bar X\in S) =\mathbb P(\{Q(F(\bar X) =\bar X\}) = 1 $$
And we are done.
Best Answer
As the Wikipedia page states, "Unfortunately, the distribution does not, in general, have an inverse. One may define, for $y \in [0,1]$, the generalized inverse distribution function $F^{-1}(y)= \inf \{x: F(x) \geq y\}$."
Since not all distribution functions $F$ are strictly increasing, not all $F$ have (proper) inverses. When $F$ does has a proper inverse, then the two definitions coincide. When $F$ does not have an inverse (example?) then the definition given still retains many useful properties used in probabilistic arguments.
$F^{-1}$ is simply the notation used in this context to be suggestive of an inverse. In some books, the notation $F^{\leftarrow}$ is used to distinguish it or make clear that a (proper) inverse may not exist and the generalized version is being used. However, this alternate notation is not all that common.
Let's briefly prove some of the facts on the Wikipedia webpage.
Fact 1: $F^{-1}(y)$ is nondecreasing.
Proof: Let $y_1 < y_2$. Then $F(x) \geq y_2$ implies $F(x) \geq y_1$, and so $\{x: F(x) \geq y_1\} \supset \{x:F(x) \geq y_2\}$. Hence, clearly, $\inf \{x: F(x) \geq y_1\} \leq \inf \{x : F(x) \geq y_2\}$.
Fact 2: $F^{-1}(F(x)) \leq x$.
Proof: $F^{-1}(F(x)) = \inf\{z: F(z) \geq F(x)\}$ and $x \in \{z: F(z) \geq F(x)\}$, so the result follows.
Fact 3: $F( F^{-1}(y) ) \geq y$.
Proof: Let $x_n \in \{x : F(x) \geq y\}$ such that $x_n \to x_0$. Then $\liminf_n F(x_n) \geq y$, but since $F$ is monotone nondecreasing and continuous from the right, $\liminf_n F(x_n) \leq F(x_0)$. Hence $F(x_0) \geq y$, that is, $x_0 \in \{x: F(x) \geq y\}$. So, $\{x: F(x) \geq y\}$ is closed and hence contains its infimum. Thus, $F(F^{-1}(y)) \geq y$.
Fact 4: $F^{-1}(y) \leq x$ iff $y \leq F(x)$.
Proof: $F^{-1}(y) \leq x$ implies $x \in \{z: F(z) \geq y\}$ and so $y \leq F(x)$. On the other hand, if $y \leq F(x)$ then $x \in \{z : F(z) \geq y\}$ and so $F^{-1}(y) \leq x$ since $F^{-1}(y)$ is the infimum.