Solved – the R rnbinom negative binomial dispersion parameter

count-datanegative-binomial-distributionr

In the R function, rnbinom, one of the parameters is the dispersion or shape parameter. This can be parameterized as theta or alpha, depending on how the model is written. I can't tell from ?rnbinom what its asking for. Anyone have an idea?

EDIT:
I've run a simple negative binomial regression model, and want to use the model parameters to produce the theoretical distribution for simulation work. I'm not exactly sure how to use the dispersion parameter. Here's the output from R:

summary(glm.nb(exit~1+offset(log(stock)),data=d2))

Call:
glm.nb(formula = exit ~ 1 + offset(log(stock)), data = dt, init.theta = 5.855047422, 
    link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.83778  -0.86369   0.00863   0.62604   1.80784  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -3.689      0.029  -127.2   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(5.855) family taken to be 1)

    Null deviance: 218.61  on 211  degrees of freedom
Residual deviance: 218.61  on 211  degrees of freedom
AIC: 2297.5

Number of Fisher Scoring iterations: 1


              Theta:  5.855 
          Std. Err.:  0.582 

 2 x log-likelihood:  -2293.500 

I will use rnbinom to model the distribution, taking as parameters:

x<-rnbinom(nrow(dt),size=5.855,mu=1/exp(-3.689))

My question is if I'm parametrizing the size parameter appropriately. Should it be 5.855, or 1/5.855? I more or less understand the different parametrizations of the model, as either $\theta$ (or $r$) or $\alpha$, and from here I know glm.nb is reporting $\theta$. I'm not exactly sure what rnbinom is looking for with its size parameter – am I correct in assuming it is $\theta$, and my code here correct (size=5.855).

Best Answer

The documentation calls it "size":

size   target for number of successful trials, or dispersion parameter (the shape parameter of the gamma mixing distribution). Must be strictly positive, need not be integer.

That is the simplest way to understand it. The negative binomial distribution is typically understood as:

a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs.

In other words, when performing a series of coin flips, you could count how many tails you got before you got $r$ heads, with a coin that has a $p$ probability of heads.