Negative Binomial Distribution – Understanding Key Parameters

distributionsmodelingnegative-binomial-distributionr

I was trying to fit my data into various models and figured out that the fitdistr function from library MASS of R gives me Negative Binomial as the best-fit. Now from the wiki page, the definition is given as:

NegBin(r,p) distribution describes the probability of k failures and r
successes in k+r Bernoulli(p) trials
with success on the last trial.

Using R to perform model fitting gives me two parameters mean and dispersion parameter. I am not understanding how to interpret these because I cannot see these parameters on the wiki page. All I can see is the following formula:

Negative Binomial Distribution Formula

where k is the number of observations and r=0...n. Now how do I relate these with the parameters given by R? The help file does not provide much information either.

Also, just to say a few words about my experiment: In a social experiment that I was conducting, I was trying to count the number of people each user contacted in a period of 10 days. The population size was 100 for the experiment.

Now, if the model fits the Negative Binomial, I can blindly say that it follows that distribution but I really want to understand the intuitive meaning behind this. What does it mean to say that the number of people contacted by my test subjects follows a negative binomial distribution? Can someone please help clarify this?

Best Answer

You should look further down the Wikipedia article on the NB, where it says "gamma-Poisson mixture". While the definition you cite (which I call the "coin-flipping" definition since I usually define it for classes as "suppose you want to flip a coin until you get $k$ heads") is easier to derive and makes more sense in an introductory probability or mathematical statistics context, the gamma-Poisson mixture is (in my experience) a much more generally useful way to think about the distribution in applied contexts. (In particular, this definition allows non-integer values of the dispersion/size parameter.) In this context, your dispersion parameter describes the distribution of a hypothetical Gamma distribution that underlies your data and describes unobserved variation among individuals in their intrinsic level of contact. In particular, it is the shape parameter of the Gamma, and it may be helpful in thinking about this to know that the coefficient of variation of a Gamma distribution with shape parameter $\theta$ is $1/\sqrt{\theta}$; as $\theta$ becomes large the latent variability disappears and the distribution approaches the Poisson.