Conjugate prior Gamma distribution on Poisson intensity

probability distributions

I don't know what I am missing in following in my understanding. Whether it is my mathematica code that is incorrect or my mathematical skill is short.

Gamma distribution is the conjugate prior when the likelihood function is Poisson distribution. By this, I understood the following:

$$\overbrace{f(x|y)}^{\rm Gamma(\alpha+x, \beta + 1)}=\frac{\overbrace{f(y|x)}^{\rm Poisson(\lambda)} \cdot \overbrace{f(x)}^{\rm Gamma(\alpha, \beta)}}{\underbrace{f(y)}_{NB(\alpha, \frac{1}{1+\beta}\:)}} $$

If my understanding above is correct then why the LHS is not equal to RHS in following Mathematica code ?

Remove["Global`*"]
    f = FullSimplify[(PDF[PoissonDistribution[y], 
          x])*(PDF[GammaDistribution[a, b], y])/(PDF[
           NegativeBinomialDistribution[a, 1/(1 + b)], x])];
    g = PDF[GammaDistribution[a + x, (b + 1)], y];
    a = 1;
    b = 2;
    x = 5;
    Plot[{f, g}, {y, 1, 10}, PlotLegends -> "Expressions"]

Correction on Wikipedia page conjugate prior

$$\overbrace{f(\lambda|x)}^{\text{Gamma}(k+x,\frac{\theta}{\theta + 1})}=\frac{\overbrace{f(x|\lambda)}^{\text{Poisson}(\lambda)} \cdot \overbrace{f(\lambda)}^{\text{Gamma}(k, \theta)}}{\underbrace{f(y)}_{\text{NB}(k, \frac{1}{1+\theta}\:)}} $$

and

$$\overbrace{f(\lambda|x)}^{\text{Gamma}(\alpha+x,\beta+1)}=\frac{\overbrace{f(x|\lambda)}^{\text{Poisson}(\lambda)} \cdot \overbrace{f(\lambda)}^{\text{Gamma}(\alpha, \beta)}}{\underbrace{f(y)}_{\text{NB}(\alpha, \frac{\beta}{\beta +1}\:)}} $$

i.e. the two Negative Binomial distributions are mistakenly swapped on wiki page.

Best Answer

You are not getting the correct result because you have two problems: first, you need to be clear about what type of parametrization you are using for the gamma distribution. Specifically, if the parametrization is shape $\alpha$ and rate $\beta$, then the gamma density is $$f_X(x) = \frac{\beta^\alpha x^{\alpha - 1} e^{-\beta x}}{\Gamma(\alpha)}.$$ If the parametrization is shape $\alpha$ and scale $\beta$, then $$f_X(x) = \frac{x^{\alpha - 1} e^{-x/\beta}}{\beta^\alpha \Gamma(\alpha)}.$$ You should not confuse the two, because in each case the conjugacy looks different.

The second problem is that your hierarchical model specification is reversed in your Mathematica code. The model is

$$X \sim \operatorname{Gamma}(\alpha, \beta) \\ Y \mid X \sim \operatorname{Poisson}(X).$$ So $X$ is the unobserved prior and $Y$ is the distribution of the observed data. Then the marginal distribution is $Y \sim \operatorname{NegativeBinomial}(?, ?)$ and the posterior is $X \mid Y \sim \operatorname{Gamma}(?, ?)$. So in Mathematica, the syntax for a shape-scale parametrization should be:

f = FullSimplify[PDF[PoissonDistribution[x], y]
        PDF[GammaDistribution[a, b], x] /
        PDF[NegativeBinomialDistribution[a, 1/(1+b)], y],
        {x > 0, y >= 0, a > 0, b > 0}]
g = FullSimplify[PDF[GammaDistribution[a + y, b/(1+b)], x],
        {x > 0, y >= 0, a > 0, b > 0}]

And then f/g should evaluate to 1 symbolically. Note the two errors: first, you've mixed up x and y in various places in your code, and second, because Mathematica consistently uses the shape-scale parametrization, the posterior scale hyperparameter is not $1 + \beta$ but rather $\beta/(1 + \beta)$. In other words, you are using inconsistent parametrizations on each side of your equation; on the left, a shape-rate gamma, and on the right, a shape-scale gamma.

The same code in a shape-rate parametrization would look like this:

f = FullSimplify[PDF[PoissonDistribution[x], y]
        PDF[GammaDistribution[a, 1/b], x] /
        PDF[NegativeBinomialDistribution[a, b/(1+b)], y],
        {x > 0, y > 0, a > 0, b > 0}]
g = FullSimplify[PDF[GammaDistribution[a + y, 1/(1+b)], x],
        {x > 0, y > 0, a > 0, b > 0}]

in which case, the posterior rate hyperparameter is $1 + \beta$ as you claim on the LHS of your equation, but then the negative binomial parametrization needs to be adjusted. Specifically, when we say $Y \sim \operatorname{NegativeBinomial}(r, p)$, we could mean $$\Pr[Y = y] = \binom{y + r - 1}{r - 1} p^r (1-p)^y, \quad y \in \{0, 1, 2, \ldots\},$$ or we could mean $$\Pr[Y = y] = \binom{y + r - 1}{r - 1} (1-p)^r p^y, \quad y \in \{0, 1, 2, \ldots\}.$$ Mathematica uses the first one. If however you use the second one, then the formula you wrote becomes consistent in parametrizations and $\beta$ is a rate parameter.

This is why it is important to:

  1. Always specify the parametrization by stating explicitly the density and mass functions for the model when there is any possibility of ambiguity.

  2. Check the documentation for Mathematica to see what parametrization is used for the built-in functions.