Solved – Quantiles of a compound gamma/negative binomial distribution

gamma distributionnegative-binomial-distributionquantilesrsimulation

Are there any formulae for the quantiles of a compound gamma/negative binomial distribution? That is, suppose we have

$$N \sim \text{NegBin}(\alpha, \lambda)$$

and conditional on $N$,

$$Y =
\begin{cases}
0 & \text{if } N = 0 \\
\sum_{i=1}^N X_i & \text{otherwise}
\end{cases}$$

where each of the $X_i$ is iid $\text{Gamma}(\mu, \sigma)$. Given values of $\alpha, \lambda, \mu$ and $\sigma$, I'd like to calculate the quantiles of $Y$.

Background: I'm modelling ATM withdrawals per day. The number of withdrawals I assume has a negative binomial distribution, and the amount of each withdrawal has a gamma distribution. I can estimate all the parameters from the data, and now I want to estimate how much cash to keep in the ATM. Keeping too little cash risks not having enough to meet demand, but too much is also bad. So, I want to estimate how much cash is needed to meet demand for a given level of cash-out risk. That is, I want an upper quantile of the compound distribution, say at the 99%, 99.9% or 99.99% level.

Right now I'm using simulation to get an estimate, but this takes a long time especially for very high quantiles. Eg if I want to get the 99.99% quantile, I need at least 10,000 replications, and ideally much more than that; I also need to repeat this for each ATM. So some kind of analytical approximation would be great.

(I realise that the actual distribution is probably more complicated than a simple negbin/gamma combination. For example the observed withdrawals would be right censored since if an ATM runs out of cash then there's no more withdrawals for the day. I'm keeping things simple to start with.)

Best Answer

As a practical answer to the real questions you're addressing, such high quantiles will generally be quite sensitive to issues with model choice (especially such things as whether you model the right censoring and how heavy the tails are in the components).

But in any case - especially when dealing with high quantiles where ordinary simulation becomes impractical - that has great value in its own right; it's an interesting question from both theoretical and practical standpoints.


A couple of other approaches to this problem are (1) using the Fast Fourier Transform and (2) direct numerical integration.

One useful reference on this topic is Luo and Shevchenko (2009)$^{[1]}$.

In it they develop an adaptive direct numerical integration approach that's faster than simulation and competitive with FFT.

The more traditional approach in actuarial work was been (3) Panjer recursion, which can be found in numerous texts. Embrechts and Frei (2009)$^{[2]}$ discuss and compare Panjer recursion and FFT. (Note that both of these techniques involve discretization of the continuous distribution.)


On the other hand, doing a very unsophisticated version of simulation, and with no effort whatever made to be efficient, generating from a compound gamma-negative binomial isn't particularly onerous. This is timing on my kids' little laptop:

system.time(replicate(100000,sum(rgamma(MASS:::rnegbin(1,4,2),5,.1))))
   user  system elapsed 
   2.82    0.00    2.84 

I think 2.8 seconds to generate 100$^{\,}$000 simulations of a compound distribution on a slow little laptop really isn't bad. With some effort to be efficient (of which one might suggest many possibilities), I imagine that could be made a good deal faster.

Here's the ecdf for $10^6$ simulations (which took about 29 seconds):

$\hspace{1cm}$enter image description here

We see the characteristic discrete jump at zero you expect to see with a compound distribution.

[While it should be easy to make simulation a lot faster, all three alternatives mentioned here - if carried out sensibly - should be a lot faster still.]


You should note that the actuar package supports computation with compound distributions, and offers several methods for calculation with them.

See, for example, this vignette which discusses this facility.

[Of possibly some further passing interest, note that there is an R package for the Poisson-lognormal distribution -- poilog; if you need that distribution at some point it may be useful.]


Added in edit:

A potential quick approximation where the gamma shape parameter isn't changing -

In the gamma case, because a convolution of gammas with constant shape parameter is another gamma, you could write down the distribution of $Y|N=n$, and then evaluate the cdf and the density at a large number of grid-values at each $n$, then simply accumulate the sum directly (rather as one would for a KDE). The direct calculation only yields a lower bound to the true quantile, but if the negative binomial is not heavy tailed it should be quite rapid.


References:

[1]: Luo, X. and Shevchenko, P.V. (2009),
"Computing Tails of Compound Distributions Using Direct Numerical Integration,"
Journal of Computational Finance, 13 (2), 73-111.
[arXiv preprint available here]

[2]: Embrechts, P., and Frei, M. (2009),
"Panjer recursion versus FFT for compound distributions,"
Mathematical Methods of Operations Research, 69:3 (July) pp 497-508.
[seems to be a pre-publication version here]

Related Question