Solved – Relationship between Poisson, binomial, negative binomial distributions and normal distribution

binomial distributioncount-datanegative-binomial-distributionnormal distributionpoisson distribution

When we have to define discrete counts distributions, we usually use :

Poisson distribution, if mean = variance
Binomial distribution, if mean > variance
Negative binomial distribution, if mean < variance

My question is, is it possible to use normal distribution to approximate? For example, in order to have a Poisson distribution (with mean =4), we begin with a normal distribution (with mean = variance = 4)

x=seq(0,20,1)
plot(x,dpois(x,4))
points(x,dnorm(x,4,2),col=2)

We can see that the two densities are not very different. Now, if we define thresholds and somes rules:

if the outcome of normal law is negative, then it is 0
for x=6.2, then it is 6, etc etc.

It is possible to use a such approximation from normal distribution to completely define a Poisson distribution ? Same thing for negative binomial and binomial.

Why I try to do this? Usually, when we try a define a Poisson distribution with real life data, we never have mean = variance. So when we use a Poisson distribution, it is because we have approximately this condition. We have to discuss these three cases, with estimated mean and variance (from real life data).

So, my idea is to always use

the empirical mean and variance, to define a normal distribution
then, definie somes "rules" in function of these parameters
so that we calculate the mean and variance on simulated discret counts data, we can verify the initial empirical mean and variance.

What do you think of this method, when it comes to simulate discrete count data, rather then use Poisson, binomial or negative binomial distribution ?

Best Answer

The binomial distribution is the distribution of the number of successes in a fixed (i.e. not random) number of independent trials with the same probability of success on each trial. It support is the set $\{0,1,2,\ldots,n\}$, which is finite, where $n$ is the number of trials.
The negative binomial distribution is the distribution of the number of failures before a fixed (i.e. not random) number of successes, again with independent trials and the same probability of success on each trial. Its support is the set $\{0,1,2,3,\ldots\}$, which is infinite.
The Poisson distribution can be loosely characterized as the number of successes in an infinite number of independent trials with an infinitely small probability of success on each trial, in which the expected number of successes is some fixed positive number. It is a limit of the binomial distribution in which the number of trials approaches $\infty$ and the probability of success on each trial approaches $0$ in such a way that the expected number of successes remains constant or at least approaches some positive number.

It is true that for the binomial distribution the mean is larger than the variance, for the negative binomial distribution the mean is smaller than the variance, and for the Poisson distribution they are equal.

But it is not true that for every distribution whose support is some set of cardinal numbers, if the mean equals the variance then it is a Poisson distribution, nor that if the mean is greater than the variance it is a binomial distribution, nor that if the mean is less than the variance it is a negative binomial distribution. For example, the mean of the hypergeometric distribution that arises from sampling without replacement is greater than the variance, as with the binomial distribution, but the distribution is not the same. For the uniform distribution on the set $\{0,1,2,\ldots,n\}$, if $n>4$ then the variance is greater than the mean, as with the negative binomial distribution, but the distribution is not the same. For the uniform distribution on the set $\{0,2\}$, the variance is equal to the mean, as with the Poisson distribution, but the distribution is not the same.

If $X\sim\mathrm{Poisson}(\lambda)$ then $$ \frac{X-\lambda}{\sqrt\lambda} \overset{\text{D.}} \longrightarrow N(0,1) \text{ as } \lambda\to\infty $$ because when $\lambda$ is large, the distribution of $X$ is the same as the distribution of the sum of a large number of Poisson distributed random variables whose sum is near $1$. That is because the sum of independent Poisson-distributed random variables is Poisson distributed, so the central limit theorem can be applied.

If $X\sim\mathrm{Binomial}(n,p)$ then $$ \frac{X-np}{\sqrt{np(1-p)}} \overset{\text{D.}}\longrightarrow N(0,1) \text{ as } n \to \infty $$ because $X$ has the same distribution as the sum of $n$ independent random variables distributed as $\mathrm{Binomial}(1,p)$, so again the central limit theorem applies.

The negative binomial distribution with parameters $r,p$ is the distribution of the number of failures before the $r$th success, with probability $p$ of success on each trial. If $X$ is so distributed then we have $$ \frac{X- (pr/(1-p)) }{\sqrt{pr}/(1-p)} \overset{\text{D.}} \to N(0,1) \text{ as } r\to\infty $$ because $X$ has the same distribution as the sum of $r$ independent random variables distributed as negative binomial with parameters $1,p$, so again the central limit theorem applies.

When approximating any of these kinds of distributions with a normal distribution, note that the even $[X\le n]$ is the same as the event $[X<n+1]$, so use the continuity correction in which you find the probability that $[X\le n+\frac 1 2]$ according to the normal distribution.

Best Answer

Related Solutions

Solved – hurdle model with negative binomial distribution of counts – error message and model selection

Solved – When to use Poisson vs. geometric vs. negative binomial GLMs for count data

Related Question