Compound Distributions – How to Decompound a Compound Probability Distribution

compound-distributionsconvolutionintegralnegative-binomial-distributionpoisson distribution

I am trying to figure out how to deconvolve or decompound a compound probability density function – knowing one of the distributions and having samples from the compound distribution.

Assume I only know that the my distribution arises as a continuous mixture of a Poisson and an unknown distribution $g$:
$$ f(k) = \int_0^\infty f_{\operatorname{Poisson}(\lambda)}(k) \cdot g(\lambda) \; \mathrm{d}\lambda $$
where $g$ is a Gamma distribution in the case of a Negative Binomial distribution $f$.

Now, given samples from $f(k)$ (not necessarily from a NB distribution), what is the best way or what algorithms are used to approximate/estimate $g(\lambda)$, i.e. deconvolve the integral?

I already looked in the Panjer recursion and this paper, but if I understood them correctly both don't seem to help in my case.

Happy about any suggestions!

Best Answer

This is a classic problem that arises in empirical Bayes theory. The problem of estimating $g$ non-parametrically for a Poisson mixture was considered in the very first paper that coined the term "empirical Bayes":

Robbins, Herbert. An empirical Bayes approach to statistics. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955, vol. I, pp. 157–163. University of California Press, Berkeley and Los Angeles, 1956.

Nowadays people usually take a more parametric approach and Herbert's 1956 method is not much used in practice. Empirical Bayes inference is not sensitive to the exact shape of $g$ so it is far easier and just as effective to make some parametric assumptions about $g$. This is called "parametric empirical Bayes". For the Poisson mixture, the most common and most convenient assumption is to assume that $g$ is a gamma distribution, in which case the mixture has a closed form expression called the "negative binomial" distribution.

A recent very general approach from Bradley Efron does not make parametric assumptions about $g$ but assumes that it can be built up from spline modelling in the space of log moment generating functions. This leads to a very general type of exponential family formulation:

Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. Journal of Statistical Software, 94(11), 1 - 20. https://www.jstatsoft.org/article/view/v094i11

Efron B (2016). Empirical Bayes deconvolution estimates. Biometrika, 103(1), 1–20. http://dx.doi.org/10.1093/biomet/asv068

Best Answer

Related Solutions

Solved – Framing the negative binomial distribution for DNA sequencing

Related Question