Solved – Intuition for Gamma-Poisson / Negative Binomial

gamma distributionintuitionnegative-binomial-distributionpoisson distribution

I have a data set which intuitively seems Poisson-like, but it's overdispered. So I'm investigating negative binomial.

From this question and this page I understand one way of viewing this is to state that we are uncertain about the density parameter $\lambda$ in Poisson, and assume $\lambda \sim \text{Gamma}(\alpha, \beta)$, in which case we get a negative binomial.

I get the math here, but I don't understand why we would think that $\lambda$ is gamma-distributed. For example, the wordpress link says that when $\lambda$ represents "likelihood to default on a loan", then it would be gamma-distributed. But I don't really understand why.

Is there a way I can determine if my density is gamma distributed?

Best Answer

Let's see what Dan Ma actually says in his blog. To quote:

There is uncertainty in the parameter $\theta$, reflecting the risk characteristic of the insured. Some insureds are poor risks (with large $\theta$) and some are good risks (with small $\theta$). Thus the parameter $\theta$ should be regarded as a random variable $\Theta$. The following is the conditional distribution of $N$ (conditional on $\Theta=\theta$):

$$\displaystyle (15) \ \ \ \ \ P(N=n \lvert \Theta=\theta)=\frac{e^{-\theta} \ \theta^n}{n!} \ \ \ \ \ \ \ \ \ \ n=0,1,2,\cdots$$

Aside from some small oddness in the wording, the gist of that is fine. The parameter of the Poisson ($\theta$ in the quoted discussion) represents the underlying rate of claims per unit time; that individuals are homogeneous, and have different 'riskiness' (different claim-rates) isn't controversial.

So why does he think that the distribution of the claim-rate is distributed as gamma?

Well, actually he doesn't say that he thinks that at all.

What he says is:

Suppose that $\Theta$ has a Gamma distribution with scale parameter $\alpha$ and shape parameter $\beta$.

He's positing a circumstance -- discussing an assumption if you wish -- for which he then discusses the consequences.

He doesn't even assert anything about the plausibility of the assumption.


Here's some things that might be reasonable to assert/suppose about the claim-rate distribution:

1) It's necessarily non-negative and may be taken to be continuous

2) we could expect that it would tend to be right-skew

3) We might not-too-unreasonably expect there to be a typical level (a mode), around which the bulk of the distribution lies, and that it tails off as we move further away (i.e. it might be reasonable to expect that it would be unimodal, at least to a first approximation)

That's about all we could say without collecting data.

The gamma at least doesn't break any of those suppositions/expectations, and so is likely to result in a more useful distribution than assuming homogeneity of claim-rate, but any number of other distributions satisfy those conditions.

So why gamma rather than lognormal say? Likely, a matter of convenience; the gamma works nicely with the Poisson - which even conditional on the individual underlying claim-frequency is itself another assumption that isn't actually true (though we can make some argument that the assumptions of claims having a Poisson process may not be too badly wrong, it's clear that they can't be exactly true).

There's no good reason to think it is gamma-distributed.

Indeed, I'll assert here and now that there's no real-world case where the claim rate is actually gamma distributed, in practice there will always be differences between the actual distribution of interest and some simple model for it; but that's true of essentially all our probability models.

They're convenient fictions, which may sometimes be not so badly inaccurate as to have some value.

Is there a way I can determine if my density is gamma distributed?

Nothing will tell you it is; in fact you can be quite sure - even when it looks like an excellent description of the distribution - that the gamma is at best merely an approximation. You can use diagnostic displays (perhaps something like a Q-Q plot) to help check that it's not too far from gamma.