(Note that in the following I am quite careful about when to use capital $\Lambda$ and when to use lower-case $\lambda$, and when to use capital $X$ and when to use lower-case $x$.)
Suppose the distribution of $\Lambda$ is the Gamma distribution:
$$
\frac 1 {\Gamma(\alpha)} \left( \frac \lambda \beta \right)^{\alpha-1} e^{-\lambda/\beta} \left(\frac{d\lambda}\beta\right).
$$
And suppose the conditional distribution of $X$ given $\Lambda$ is the Poisson distribution
$$
x \mapsto \frac{e^{-\Lambda} \Lambda^x }{x!} \text{ for } x = 0,1,2,3,\ldots
$$
Then you have
\begin{align}
& \Pr(X=x) \\[10pt]
= {} & \operatorname{E}(\Pr(X=x \mid \Lambda)) = \operatorname{E}\left( \frac{e^{-\Lambda} \Lambda^x}{x!} \right) \\[10pt]
= {} & \int_0^\infty \left( \frac{e^{-\lambda}\lambda^x}{x!} \right) \frac 1 {\Gamma(\alpha)} \left( \frac \lambda \beta \right)^{\alpha-1} e^{-\lambda/\beta} \left(\frac{d\lambda}\beta\right) \\[10pt]
= {} & \frac 1 {x!\Gamma(\alpha)\beta^\alpha} \int_0^\infty \lambda^{x+\alpha-1} e^{-\lambda(1\,+\,1/\beta)} \, d\lambda \\
& \qquad \qquad \text{(We pulled out the factors that do not depend on $\lambda$).}\\[10pt]
= {} & \frac 1 {x!\Gamma(\alpha)\beta^\alpha} \cdot \left(\frac \beta {\beta+1} \right)^{x+\alpha} \int_0^\infty \left( \lambda \left( 1 + \frac 1 \beta \right) \right)^{x+\alpha-1} e^{-\lambda(1+1/\beta)} \left( d\lambda \left( 1 + \frac 1 \beta \right) \right) \\[10pt]
= {} & \frac 1 {x!\Gamma(\alpha)\beta^\alpha} \cdot \left(\frac \beta {\beta+1} \right)^{x+\alpha} \int_0^\infty u^{x+\alpha-1} e^{-u} \, du \\[10pt]
= {} & \frac {\Gamma(x+\alpha)} {x!\Gamma(\alpha)\beta^\alpha} \cdot \left(\frac \beta {\beta+1} \right)^{x+\alpha} \\[10pt]
= {} & \frac {\Gamma(x+\alpha)} {x!\Gamma(\alpha)} \cdot \left( \frac 1 {\beta+1} \right)^\alpha \left(\frac \beta {\beta+1} \right)^x
\end{align}
and this is a negative binomial probability mass function with $p$ and $q$ equal to $\dfrac 1 {\beta+1}$ and $\dfrac\beta{\beta+1}$, not necessarily respectively (depending on which convention you follow). This is the probability that the number of failures before the $\alpha$th success is $x$, when the probability of success on each independent trial is $1/(\beta+1).$
So a Gamma mixture of Poisson distributions is a negative binomial distribution, and that is the connection between Poisson and negative binomial.
That's an interesting question. My research group has been using the distribution you refer to for some years in our publicly available bioinformatics software. As far as I know, the distribution does not have a name and there is no literature on it. While the paper by Chandra et al (2012) cited by Aksakal is closely related, the distribution they consider seems to be restricted to integer values for $r$ and they don't seem to give an explicit expression for the pdf.
To give you some background, the NB distribution is very heavily used in genomic research to model gene expression data arising from RNA-seq and related technologies. The count data arises as the number of DNA or RNA sequence reads extracted from a biological sample that can be mapped to each gene. Typically there are tens of millions of reads from each biological sample that are mapped to about 25,000 genes. Alternatively one might have DNA samples from which reads are mapped to genomic windows. We and others have popularized an approach whereby NB glms are fitted to the sequence reads for each gene, and empirical Bayes methods are used to moderate the genewise dispersion estimators (dispersion $\phi=1/r$). This approach has been cited in tens of thousands of journal articles in the genomic literature, so you can get an idea of how much it gets used.
My group maintains the edgeR R sofware package. Some years ago we revised the whole package so that it works with fractional counts, using a continuous version of the NB pmf. We simply converted all the binomial coefficients in the NB pmf to ratios of gamma functions and used it as a (mixed) continuous pdf. The motivation for this was that sequence read counts can sometimes be fractional because of (1) ambiguous mapping of reads to the transcriptome or genome and/or (2) normalization of counts to correct for technical effects. So the counts are sometimes expected counts or estimated counts rather than observed counts. And of course the read counts can be exactly zero with positive probability. Our approach ensures that the inference results from our software are continuous in the counts, matching exactly with discrete NB results when the estimated counts happen to be integers.
As far as I know, there is no closed form for the normalizing constant in the pdf, nor are there closed forms for the mean or variance. When one considers that there is no closed form for the integral
$$\int_0^\infty \frac{1}{\Gamma(x)}dz$$
(the Fransen-Robinson constant) it is clear that there cannot be for the integral of the continuous NB pdf either.
However it seems to me that traditional mean and variance formulas for the NB should continue to be good approximations for the continuous NB. Moreover the normalizing constant should vary slowly with the parameters and therefore can be ignored as having negligible influence in the maximum likelihood calculations.
One can confirm these hypotheses by numerical integration. The NB distribution arises in bioinformatics as a gamma mixture of Poisson distributions (see the Wikipedia negative binomial article or McCarthy et al below).
The continuous NB distribution arises simply by replacing the Poisson distribution with its continuous analog with pdf
$$f(x;\lambda)=a(\lambda)\frac{e^{-\lambda}\lambda^x}{\Gamma(x+1)}$$
for $x\ge 0$ where $a(\lambda)$ is a normalizing constant to ensure the density integrates to 1.
Suppose for example that $\lambda=10$. The Poisson distribution has pmf equal the above pdf on the non-negative integers and, with $\lambda=10$, the Poisson mean and variance are equal to 10.
Numerical integration shows that $a(10)=1/0.999875$ and the mean and variance of the continuous distribution are equal to 10 to about 4 significant figures.
So the normalizing constant is virtually 1 and the mean and variance are almost exactly the same as for the discrete Poisson distribution.
The approximation is improved even more if we add a continuity correction, integrating from $-1/2$ to $\infty$ instead of from 0.
With the continuity correction, everything is correct (normalizing constant is 1 and moments agree with discrete Poisson) to about 6 figures.
In our edgeR package, we do not need to make any adjustment for the fact that there is mass at zero, because we always work with conditional log-likelihoods or with log-likelihood differences and any delta functions cancel out of the calculations. This is typical BTW for glms with mixed probability distributions. Alternatively, we could consider the distribution to have no mass at zero but to have support starting at -1/2 instead of at zero. Either theoretical perspective leads to the same calculations in practice.
Although we make active use of the continuous NB distribution, we haven't published anything on it explicitly. The articles cited below explain the NB approach to genomic data but don't discuss the continuous NB distribution explicitly.
In summary, I am not surprised that the article you are studying obtained reasonable results from a continualized version of the NB pdf, because that is our experience also. The key requirement is that we should be modelling the means and variances correctly and that will be fine provided the data, whether integer or not, exhibits the same form of quadratic mean-variance relationship that the NB distribution does.
References
Robinson, M., and Smyth, G. K. (2008). Small sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9, 321-332.
Robinson, MD, and Smyth, GK (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23, 2881-2887.
McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297.
Chen, Y, Lun, ATL, and Smyth, GK (2014). Differential expression analysis of complex RNA-seq experiments using edgeR. In: Statistical Analysis of Next Generation Sequence Data, Somnath Datta and Daniel S Nettleton (eds), Springer, New York, pages 51--74. Preprint
Lun, ATL, Chen, Y, and Smyth, GK (2016). It's DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Methods in Molecular Biology 1418, 391-416. Preprint
Chen Y, Lun ATL, and Smyth, GK (2016). From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Research 5, 1438.
Best Answer
IMOH, I really think that the negative binomial distribution is used for convenience.
So in RNA Seq there is a common assumption that if you take an infinite number of measurements of the same gene in an infinite number of replicates then the true distribution would be lognormal. This distribution is then sampled via a Poisson process (with a count) so the true distribution reads per gene across replicates would be a Poisson-Lognormal distribution.
But in packages that we use such as EdgeR and DESeq this distribution modeled as a negative binomial distribution. This is not because the guys that wrote it didn't know about a Poisson Lognormal distribution.
It is because the Poisson Lognormal distribution is a terrible thing to work with because it requires numerical integration to do the fits etc. so when you actually try to use it sometimes the performance is really bad.
A negative binomial distribution has a closed form so it is a lot easier to work with and the gamma distribution (the underlying distribution) looks a lot like a lognormal distribution in that it sometimes looks kind of normal and sometimes has a tail.
But in this example (if you believe the assumption) it can't possibly be theoretically correct because the theoretically correct distribution is the Poisson lognormal and the two distributions are reasonable approximations of one another but are not equivalent.
But I still think the "incorrect" negative binomial distribution is often the better choice because empirically it will give better results because the integration performs slowly and the fits can perform badly, especially with distributions with long tails.