Can anybody show me, why the dispersion parameter of the negative binomial distribution is taken to be one? In the Poisson case you can show that $E(y)/V(y)=\mu/\mu=1$ which is called equidispersion. But how can you show that the dispersion parameter of the negative binomial distribution is 1 as well?
Negative Binomial Distribution – Dispersion Parameter of Negbin Distribution
negative-binomial-distributionoverdispersionproof
Related Solutions
This answer comes in two parts. The first addresses the issue of the standard errors and why that implies the models are not identical, as @ndoogan observed in comments, and the second addresses, partially, the issue of when the coefficient estimates might be substantially different.
Consider, for example, a hypothesis test on the coefficient of log(Vmaj)
where the null is that the coefficient equals 0.5. The two sets of model estimates would result in rejection of the null in the Poisson case, unless testing at a very high confidence level, and failure to reject the null in the NB case.
More generally, there is more to a collection of estimates than just the point estimates. In the presence of (Negative binomial-like) overdispersion, the standard errors produced by the Negative Binomial-based model will be more accurate estimates of the underlying standard deviation of the coefficient estimates. Thus, the NB model as a whole is more accurate.
For example, a simple model with $\mu_i = \exp\{1 + x_i\}$, and $y_i \sim NB$ with mean $\mu_i$ and variance $5\mu_i$:
N <- 1000
mu <- exp(1 + (x <- rnorm(N)))
p <- 0.2
r <- mu * p / (1-p)
y <- rnbinom(N, r, p)
poisson_summary = summary(glm(y~x, family="poisson"))
nb_summary = summary(glm.nb(y~x))
# Parametric bootstrap calculation of the s.e. of the coefficient of x
coef_x <- rep(0,1000)
for (i in 1:length(coef_x)) {
mu <- exp(1 + (x <- rnorm(N)))
r <- mu * p / (1-p)
y <- rnbinom(N, r, p)
coef_x[i] <- summary(glm(y~x, family="poisson"))$coefficients["x","Estimate"]
}
data.frame("Sim. SE" = sd(coef_x - 1),
"Poisson SE" = poisson_summary$coefficients["x", "Std. Error"],
"N.B. SE" = nb_summary$coefficients["x", "Std. Error"])
Sim..SE Poisson.SE N.B..SE
1 0.03492467 0.01273354 0.03984743
where you can see that the simulated std. deviation of the coefficient estimate is roughly 3x the Poisson-based estimate of same, and much better estimated by the NB-based estimate of same.
The coefficient estimates are pretty much the same, as one might expect with this sample size and the std. errors above, so I won't take up space by displaying them.
Additionally, although this is often a minor effect when overdispersion is low, by weighting the observations with more accurate weights (i.e., better estimates of observation-specific variances), the parameter estimates themselves will be more accurate, well, asymptotically at any rate. The rule of thumb I learned long ago was that heteroskedasticity adjustments (for that is what they are, in essence) don't buy you much unless the differences between weights are on the order of 5x or more.
Note, however, that in small samples you may well get more accurate (in MSE terms) point estimates with the Poisson model if there isn't much overdispersion, because you are reducing the variability induced by estimating the dispersion parameter. Of course, this is almost certainly more than offset by the loss of accuracy in the standard errors of the coefficient estimates.
Yes you have the form right, specifically the variance for the $j$th observation is:
$$\text{Var}(x_j) = \mu_j(1 + au_j)$$
Stata and SPSS have the same defaults, so I will refer to Stata's documentation (see page 5) for a nice walkthrough. This is the NB2 model from Cameron and Trivedi (which I have not read, but is cited as such in the Long and Freese Regression Models for Categorical Dependent Variables Using Stata).
It can be tricky when reading texts. Sometimes people refer to the dispersion as the entire thing in the parentheses (which changes per the mean estimate of the variable), but other times they only refer to $a$ (which is what the software reports). I've written a blog post translating between a few of the regularly used notations for negative binomial parameterizations.
Best Answer
For a general exponential family, we have the variance in the following form:
$$Var(Y_{i})=\phi h(E[Y_{i}])$$
for some function $h(.)$. Using the wikipedia definition of negative binomial we have a pdf of:
$$p(Y_{i}=y|r,p)={r+y-1 \choose y}p^{y}(1-p)^{r}\;\;\;\;\;\;y=0,1,2,\dots$$
And this has expectation $E[Y_{i}]=\frac{pr}{1-p}$ and a variance equal to $Var[Y_{i}]=\frac{pr}{(1-p)^{2}}$. Note that this cannot be written in the usual form for a generalised linear model, but has the form:
$$Var[Y_{i}]=E[Y_{i}]+\frac{1}{r}E[Y_{i}]^{2}$$
And as such it can be seen as taking the function $h(x)=x+\frac{1}{r}x^{2}$. hence dispersion equal to "1". although neg binomial technically not a member of exponential family (as it is a mixture of exponential family, similar to student distribution).