Solved – Hyperprior density for hierarchical Gamma-Poisson model

gamma distributionhierarchical-bayesianhyperparameterpoisson distribution

In a hierarchical model of data $y$ where
$$y \sim \textrm{Poisson}(\lambda)$$
$$\lambda \sim \textrm{Gamma}(\alpha, \beta)$$
it appears to be typical in practice to chose values ($\alpha, \beta)$ such that the mean and variance of the gamma distribution roughly match the mean and variance of the data $y$ (e.g., Clayton and Kaldor, 1987 "Empirical Bayes Estimates of Age-Standardized Relative Risks for Disease Mapping," Biometrics). Clearly this is just an ad hoc solution, though, since it would overstate the researcher's confidence in the parameters $(\alpha, \beta)$ and small fluctuations in the realized data could have large consequences for the gamma density, even if the underlying data generation process remains the same.

Furthermore, in Bayesian Data Analysis (2nd Ed), Gelman writes that this method is "sloppy;" in the book and this paper (starting p. 3232), he instead suggests that some hyperprior density $p(\alpha, \beta)$ should be chosen, in a fashion similar to the rat tumors example (starting p. 130).

Although it's clear that any $p(\alpha, \beta)$ is admissible so long as it produces a finite posterior density, I have not found any examples of hyperprior densities that researchers have used for this problem in the past. I would greatly appreciate it if someone could point me to books or articles which have employed a hyperprior density to estimate a Poisson-Gamma model. Ideally, I am interested in $p(\alpha, \beta)$ that is relatively flat and would be dominated by the data as in the rat tumor example, or a discussion comparing several alternative specifications and the trade-offs associated with each.

Best Answer

Not really answering the question, since I'm not pointing you to books or articles which have employed a hyperprior, but instead am describing, and linking to, stuff about priors on Gamma parameters.

First, note that the Poisson-Gamma model leads, when $\lambda$ is integrated out, to a Negative Binomial distribution with parameters $\alpha$ and $\beta/(1+\beta)$. The second parameter is in the range $(0,1)$. If you wish to be uninformative, a Jeffreys prior on $p = \beta/(1+\beta)$ might be appropriate. You could put the prior directly on $p$ or work through the change of variables to get:

$p(\beta) \propto \beta^{-1/2}(1+\beta)^{-1}$

Alternatively, you could note that $\beta$ is the scale parameter for the Gamma distribution, and, generically, the Jeffreys prior for a scale parameter $\beta$ is $1/\beta$. One might find it odd that the Jeffreys prior for $\beta$ is different between the two models, but the models themselves are not equivalent; one is for the distribution of $y | \alpha, \beta$ and the other is for the distribution of $\lambda | \alpha, \beta$. An argument in favor of the former is that, assuming no clustering, the data really is distributed Negative Binomial $(\alpha, p)$, so putting the priors directly on $\alpha$ and $p$ is the thing to do. OTOH, if, for example, you have clusters in the data where the observations in each cluster have the same $\lambda$, you really need to model the $\lambda$s somehow, and so treating $\beta$ as the scale parameter of a Gamma distribution would seem more appropriate. (My thoughts on a possibly contentious topic.)

The first parameter can also be addressed via Jeffreys priors. If we use the common technique of developing Jeffreys priors for each parameter independently, then forming the joint (non-Jeffreys) prior as the product of the two single-parameter priors, we get a prior for the shape parameter $\alpha$ of a Gamma distribution:

$p(\alpha) \propto \sqrt{\text{PG}(1,\alpha)}$

where the polygamma function $\text{PG}(1,\alpha) = \sum_{i=0}^{\infty}(i+\alpha)^{-2}$. Awkward, but truncatable. You could combine this with either of the Jeffreys priors above to get an uninformative joint prior distribution. Combining it with the $1/\beta$ prior for the Gamma scale parameter results in a reference prior for the Gamma parameters.

If we wish to go the Full Jeffreys route, forming the true Jeffreys prior for the Gamma parameters, we'd get:

$p(\alpha, \beta) \propto \sqrt{\alpha \text{PG}(1,\alpha)-1}/\beta$

However, Jeffreys priors for multidimensional parameters often have poor properties as well as poor convergence characteristics (see link to lecture). I don't know whether this is the case for the Gamma, but testing would provide some useful information.

For more on priors for the Gamma, look at page 13-14 of A Catalog of Non-Informative Priors, Yang and Berger. Lots of other distributions are in there, too. For an overview of Jeffreys and reference priors, here are some lecture notes.

Related Question