Not really answering the question, since I'm not pointing you to books or articles which have employed a hyperprior, but instead am describing, and linking to, stuff about priors on Gamma parameters.
First, note that the Poisson-Gamma model leads, when $\lambda$ is integrated out, to a Negative Binomial distribution with parameters $\alpha$ and $\beta/(1+\beta)$. The second parameter is in the range $(0,1)$. If you wish to be uninformative, a Jeffreys prior on $p = \beta/(1+\beta)$ might be appropriate. You could put the prior directly on $p$ or work through the change of variables to get:
$p(\beta) \propto \beta^{-1/2}(1+\beta)^{-1}$
Alternatively, you could note that $\beta$ is the scale parameter for the Gamma distribution, and, generically, the Jeffreys prior for a scale parameter $\beta$ is $1/\beta$. One might find it odd that the Jeffreys prior for $\beta$ is different between the two models, but the models themselves are not equivalent; one is for the distribution of $y | \alpha, \beta$ and the other is for the distribution of $\lambda | \alpha, \beta$. An argument in favor of the former is that, assuming no clustering, the data really is distributed Negative Binomial $(\alpha, p)$, so putting the priors directly on $\alpha$ and $p$ is the thing to do. OTOH, if, for example, you have clusters in the data where the observations in each cluster have the same $\lambda$, you really need to model the $\lambda$s somehow, and so treating $\beta$ as the scale parameter of a Gamma distribution would seem more appropriate. (My thoughts on a possibly contentious topic.)
The first parameter can also be addressed via Jeffreys priors. If we use the common technique of developing Jeffreys priors for each parameter independently, then forming the joint (non-Jeffreys) prior as the product of the two single-parameter priors, we get a prior for the shape parameter $\alpha$ of a Gamma distribution:
$p(\alpha) \propto \sqrt{\text{PG}(1,\alpha)}$
where the polygamma function $\text{PG}(1,\alpha) = \sum_{i=0}^{\infty}(i+\alpha)^{-2}$. Awkward, but truncatable. You could combine this with either of the Jeffreys priors above to get an uninformative joint prior distribution. Combining it with the $1/\beta$ prior for the Gamma scale parameter results in a reference prior for the Gamma parameters.
If we wish to go the Full Jeffreys route, forming the true Jeffreys prior for the Gamma parameters, we'd get:
$p(\alpha, \beta) \propto \sqrt{\alpha \text{PG}(1,\alpha)-1}/\beta$
However, Jeffreys priors for multidimensional parameters often have poor properties as well as poor convergence characteristics (see link to lecture). I don't know whether this is the case for the Gamma, but testing would provide some useful information.
For more on priors for the Gamma, look at page 13-14 of A Catalog of Non-Informative Priors, Yang and Berger. Lots of other distributions are in there, too. For an overview of Jeffreys and reference priors, here are some lecture notes.
Best Answer
The sum of independent Gamma random variables with the same scale factor (equivalently, same rate factor) is a Gamma random variable with the same scale factor, and the order is the sum of the orders. See, for example, this question. So, apply this to your problem using the fact that exponential random variables are Gamma random variables of order $1$. Your density is not quite right, by the way: it should be $$\frac{\lambda(\lambda x)^{k-1}}{\Gamma(k)}\exp(-\lambda x) = \frac{\lambda^{k}x^{k-1}}{(k-1)!}\exp(-\lambda x)~~ \text{for}~ x>0.$$