Marginal likelihood and predictive distribution for exponential likelihood with gamma prior

bayesianexponential distributiongamma distributionmaximum likelihoodprobability distributions

Let the model distribution (likelihood) be exponential, i.e.
$$
p(x \mid \lambda)
:= \text{Exp}(\lambda)
:= \lambda e^{-\lambda x}
$$

and the prior distribution be gamma (shape-rate-parametrization), i.e.
$$
p(\lambda \mid \alpha, \beta)
:= \text{Gamma}(\alpha,\beta)
:= \frac{\beta^{\alpha}}{\Gamma{(\alpha)}} \lambda^{\alpha-1}\exp{(-\beta\lambda)}
$$

We were now asked to find the posterior distriubtion
$$
p(\lambda \mid X,\alpha,\beta) = p(X \mid \lambda)p(\lambda \mid\alpha,\beta) = \text{Exp}(\lambda) \text{Gamma}(\alpha,\beta)
$$

Now assuming $X = (x_k)_{k = 1}^{n}$ and i.i.d conditions, I calculated $$\text{Exp}(\lambda) = \prod_{k = 1}^{n} \lambda e^{-\lambda x_k} = \lambda^n e^{-\lambda \bar{X}},\quad \text{where } \overline{X} := \sum_{k = 1}^{n} x_k$$ and thus
\begin{align}
p(\lambda \mid X,\alpha,\beta)
& = \lambda^n \exp\left(-\lambda \bar{X}\right) \frac{\beta^{\alpha}}{\Gamma(\alpha)} \lambda^{\alpha – 1} \exp(-\beta \lambda) \\
& = \lambda^{\alpha + n – 1} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \exp\left(-\lambda \left(\beta + \overline{X}\right)\right)
\end{align}

and noticed $p(\lambda \mid X, \alpha, \beta) = \text{Gamma}(\alpha + n, \beta + \overline{X})$.

Until now, everything's fine.


How we were asked to find the marginal likelikhood
$$
p(X \mid \alpha,\beta)
= \int p(X \mid \lambda) p(\lambda \mid \alpha,\beta) \ d\lambda,
$$

which I calculated to be (I used the substitution $u = \lambda(\beta + \bar{X})$)
\begin{align}
\int_{0}^{\infty} \lambda^{\alpha + n – 1} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \exp\left(-\lambda \left(\beta + \overline{X}\right)\right) \ d\lambda
& = \frac{\beta^{\alpha}}{\Gamma(\alpha)} \int_{0}^{\infty} \lambda^{\alpha + n – 1} \exp\left(-\lambda \left(\beta \overline{X}\right)\right) \ d\lambda \\
& = \frac{\beta^{\alpha}}{\Gamma(\alpha) (\beta + \overline{X})^{\alpha + n}} \int_{0}^{\infty} \lambda^{\alpha + n – 1} \exp\left(-u\right) \ d u \\
& = \frac{\beta^{\alpha} \cdot \Gamma(\alpha + n)}{\Gamma(\alpha) (\beta + \overline{X})^{\alpha + n}}.
\end{align}

Is this correct? Furthermore, can I find $\widehat{\alpha}$, $\widehat{\beta}$ such that $p(X|\alpha,\beta) = \text{Gamma}(\widehat{\alpha}, \widehat{\beta})$?


A similar problem arises with the predictive distribution
$$
p(x \mid X,\alpha,\beta)
= \int p(x \mid \lambda) p(\lambda \mid X,\alpha,\beta) d\lambda,
$$

which I calculated to be (similar substitution as above)
$$
\frac{\beta^{\alpha} \cdot \Gamma(\alpha + n)}{\Gamma(\alpha) (\beta + x + \overline{X})^{\alpha + n}}.
$$

Is this correct? What are my $\widehat{\alpha}$ and $\widehat{\beta}$ now?

Best Answer

Don't make this so hard for yourself. Simply compute the kernels rather than explicitly integrating. I will change your notation because $\bar X$ is typically used for the sample mean $$\bar X = \frac{1}{n} \sum_{i=1}^n X_i$$ rather than the sample total. Then $$p(\lambda \mid X, \alpha, \beta) \propto \lambda^n e^{-\lambda n \bar X} \lambda^{\alpha - 1} e^{-\beta \lambda} = \lambda^{n + \alpha - 1} e^{-(n\bar X + \beta)\lambda}$$ which is the kernel of a gamma density with posterior shape hyperparameter $\alpha^* = n + \alpha$ and rate hyperparameter $\beta^* = n \bar X + \beta$, which agrees with your computation (keeping in mind my $\bar X$ differs from yours by a factor of $1/n$). In performing the computation, we discarded any factors that were not functions of $\lambda$. You inadvertently did this by ignoring the fact that your computation resulted in a posterior likelihood which does not integrate to unity; i.e., your expression for $p(\lambda \mid X, \alpha, \beta)$ is not a proper density since the constant terms with respect to $\lambda$ are not the required normalizing factors for a gamma density with shape $\alpha^*$ and rate $\beta^*$.

Next, if we know that the original gamma prior has a normalizing factor of $\beta^\alpha/\Gamma(\alpha)$ since $$p(\lambda, \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} K(\lambda \mid \alpha, \beta)$$ where $K$ is the kernel, and the posterior is gamma with hyperparameters $\alpha^*$, $\beta^*$, it immediately follows that the marginal likelihood is the ratio of the prior normalizing factor divided by the posterior normalizing factor; i.e., $$p(X \mid \alpha, \beta) = \frac{\beta^\alpha/\Gamma(\alpha)}{(\beta^*)^{\alpha^*}/\Gamma(\alpha^*)} = \frac{\beta^\alpha \Gamma(n+\alpha)}{(n\bar X + \beta)^{n + \alpha} \Gamma(\alpha)},$$ because of Bayes' rule: $$p(\lambda \mid X, \alpha, \beta) = \frac{p(X \mid \lambda)p(\lambda \mid \alpha,\beta)}{p(X \mid \alpha, \beta)}.$$ No integration is required. It is important to note that $p(X \mid \alpha, \beta)$ is multivariate with respect to the sample $X = (X_1, \ldots, X_n)$, thus is not itself gamma distributed.

For the posterior predictive distribution, we apply the same principles as described above. First, by Bayes' rule, $$p(x \mid X , \alpha, \beta) = \frac{p(x \mid \lambda) p(\lambda \mid X, \alpha, \beta)}{p(\lambda \mid X, x, \alpha, \beta)}$$ where the denominator is the posterior given the sample $X$ and the new observation $x$, and the numerator is the likelihood; thus the RHS is again the ratio of normalizing factors, but this time the normalizing factors correspond to the posterior density in the numerator, and the posterior plus new observation in the denominator: $$p(x \mid X, \alpha, \beta) = \frac{(\beta^*)^{\alpha^*}/\Gamma(\alpha^*)}{(\beta')^{\alpha'}/\Gamma(\alpha')},$$ where $$\alpha' = \alpha^* + 1 = n+\alpha + 1,$$ and $$\beta' = \beta^* + x = n\bar X + \beta + x.$$ Note that the posterior predictive is a univariate density in the new observation $x$, hence it is instructive to consider its kernel: $$p(x \mid X, \alpha, \beta) \propto \frac{1}{(\beta')^{\alpha'}} = (n \bar X + \beta + x)^{-\alpha'}$$ which is proportional to a Pareto (Type II) density with minimum value parameter $0$, scale parameter $\beta^* = n \bar X + \beta$ and shape parameter $\alpha^* = n + \alpha$.