Let's see what Dan Ma actually says in his blog. To quote:
There is uncertainty in the parameter $\theta$, reflecting the risk characteristic of the insured. Some insureds are poor risks (with large $\theta$) and some are good risks (with small $\theta$). Thus the parameter $\theta$ should be regarded as a random variable $\Theta$. The following is the conditional distribution of $N$ (conditional on $\Theta=\theta$):
$$\displaystyle (15) \ \ \ \ \ P(N=n \lvert \Theta=\theta)=\frac{e^{-\theta} \ \theta^n}{n!} \ \ \ \ \ \ \ \ \ \ n=0,1,2,\cdots$$
Aside from some small oddness in the wording, the gist of that is fine. The parameter of the Poisson ($\theta$ in the quoted discussion) represents the underlying rate of claims per unit time; that individuals are homogeneous, and have different 'riskiness' (different claim-rates) isn't controversial.
So why does he think that the distribution of the claim-rate is distributed as gamma?
Well, actually he doesn't say that he thinks that at all.
What he says is:
Suppose that $\Theta$ has a Gamma distribution with scale parameter $\alpha$ and shape parameter $\beta$.
He's positing a circumstance -- discussing an assumption if you wish -- for which he then discusses the consequences.
He doesn't even assert anything about the plausibility of the assumption.
Here's some things that might be reasonable to assert/suppose about the claim-rate distribution:
1) It's necessarily non-negative and may be taken to be continuous
2) we could expect that it would tend to be right-skew
3) We might not-too-unreasonably expect there to be a typical level (a mode), around which the bulk of the distribution lies, and that it tails off as we move further away (i.e. it might be reasonable to expect that it would be unimodal, at least to a first approximation)
That's about all we could say without collecting data.
The gamma at least doesn't break any of those suppositions/expectations, and so is likely to result in a more useful distribution than assuming homogeneity of claim-rate, but any number of other distributions satisfy those conditions.
So why gamma rather than lognormal say? Likely, a matter of convenience; the gamma works nicely with the Poisson - which even conditional on the individual underlying claim-frequency is itself another assumption that isn't actually true (though we can make some argument that the assumptions of claims having a Poisson process may not be too badly wrong, it's clear that they can't be exactly true).
There's no good reason to think it is gamma-distributed.
Indeed, I'll assert here and now that there's no real-world case where the claim rate is actually gamma distributed, in practice there will always be differences between the actual distribution of interest and some simple model for it; but that's true of essentially all our probability models.
They're convenient fictions, which may sometimes be not so badly inaccurate as to have some value.
Is there a way I can determine if my density is gamma distributed?
Nothing will tell you it is; in fact you can be quite sure - even when it looks like an excellent description of the distribution - that the gamma is at best merely an approximation. You can use diagnostic displays (perhaps something like a Q-Q plot) to help check that it's not too far from gamma.
I am assuming $\kappa$ here is a known non-random quantity. So here you have two data points, $k_1 \sim Pois(\lambda)$ and $k_2 \sim Pois(\kappa)$. $\kappa$ is known and $\lambda$ has a Gamma$(\alpha, \beta)$ prior. You need to find the posterior of $\lambda$.
Intuitively, the posterior of $\lambda$ should not depend on $\kappa$ since $k_2$ carries no information about $\lambda$, so $k_2$ will not impact the posterior of $\lambda$. If you do the math
\begin{align*}
g(\lambda|\alpha, \beta, k_1, k_2) & \propto g(\lambda| \alpha, \beta) f(k_1 | \lambda)f(k_2 | \kappa)\\
& \propto g(\lambda| \alpha, \beta) f(k_1 | \lambda),\\
\end{align*}
And that would just give you the posterior Gamma$(\alpha + k_1, \beta + 1)$.
Best Answer
Not really answering the question, since I'm not pointing you to books or articles which have employed a hyperprior, but instead am describing, and linking to, stuff about priors on Gamma parameters.
First, note that the Poisson-Gamma model leads, when $\lambda$ is integrated out, to a Negative Binomial distribution with parameters $\alpha$ and $\beta/(1+\beta)$. The second parameter is in the range $(0,1)$. If you wish to be uninformative, a Jeffreys prior on $p = \beta/(1+\beta)$ might be appropriate. You could put the prior directly on $p$ or work through the change of variables to get:
$p(\beta) \propto \beta^{-1/2}(1+\beta)^{-1}$
Alternatively, you could note that $\beta$ is the scale parameter for the Gamma distribution, and, generically, the Jeffreys prior for a scale parameter $\beta$ is $1/\beta$. One might find it odd that the Jeffreys prior for $\beta$ is different between the two models, but the models themselves are not equivalent; one is for the distribution of $y | \alpha, \beta$ and the other is for the distribution of $\lambda | \alpha, \beta$. An argument in favor of the former is that, assuming no clustering, the data really is distributed Negative Binomial $(\alpha, p)$, so putting the priors directly on $\alpha$ and $p$ is the thing to do. OTOH, if, for example, you have clusters in the data where the observations in each cluster have the same $\lambda$, you really need to model the $\lambda$s somehow, and so treating $\beta$ as the scale parameter of a Gamma distribution would seem more appropriate. (My thoughts on a possibly contentious topic.)
The first parameter can also be addressed via Jeffreys priors. If we use the common technique of developing Jeffreys priors for each parameter independently, then forming the joint (non-Jeffreys) prior as the product of the two single-parameter priors, we get a prior for the shape parameter $\alpha$ of a Gamma distribution:
$p(\alpha) \propto \sqrt{\text{PG}(1,\alpha)}$
where the polygamma function $\text{PG}(1,\alpha) = \sum_{i=0}^{\infty}(i+\alpha)^{-2}$. Awkward, but truncatable. You could combine this with either of the Jeffreys priors above to get an uninformative joint prior distribution. Combining it with the $1/\beta$ prior for the Gamma scale parameter results in a reference prior for the Gamma parameters.
If we wish to go the Full Jeffreys route, forming the true Jeffreys prior for the Gamma parameters, we'd get:
$p(\alpha, \beta) \propto \sqrt{\alpha \text{PG}(1,\alpha)-1}/\beta$
However, Jeffreys priors for multidimensional parameters often have poor properties as well as poor convergence characteristics (see link to lecture). I don't know whether this is the case for the Gamma, but testing would provide some useful information.
For more on priors for the Gamma, look at page 13-14 of A Catalog of Non-Informative Priors, Yang and Berger. Lots of other distributions are in there, too. For an overview of Jeffreys and reference priors, here are some lecture notes.