This answer comes in two parts. The first addresses the issue of the standard errors and why that implies the models are not identical, as @ndoogan observed in comments, and the second addresses, partially, the issue of when the coefficient estimates might be substantially different.
Consider, for example, a hypothesis test on the coefficient of log(Vmaj)
where the null is that the coefficient equals 0.5. The two sets of model estimates would result in rejection of the null in the Poisson case, unless testing at a very high confidence level, and failure to reject the null in the NB case.
More generally, there is more to a collection of estimates than just the point estimates. In the presence of (Negative binomial-like) overdispersion, the standard errors produced by the Negative Binomial-based model will be more accurate estimates of the underlying standard deviation of the coefficient estimates. Thus, the NB model as a whole is more accurate.
For example, a simple model with $\mu_i = \exp\{1 + x_i\}$, and $y_i \sim NB$ with mean $\mu_i$ and variance $5\mu_i$:
N <- 1000
mu <- exp(1 + (x <- rnorm(N)))
p <- 0.2
r <- mu * p / (1-p)
y <- rnbinom(N, r, p)
poisson_summary = summary(glm(y~x, family="poisson"))
nb_summary = summary(glm.nb(y~x))
# Parametric bootstrap calculation of the s.e. of the coefficient of x
coef_x <- rep(0,1000)
for (i in 1:length(coef_x)) {
mu <- exp(1 + (x <- rnorm(N)))
r <- mu * p / (1-p)
y <- rnbinom(N, r, p)
coef_x[i] <- summary(glm(y~x, family="poisson"))$coefficients["x","Estimate"]
}
data.frame("Sim. SE" = sd(coef_x - 1),
"Poisson SE" = poisson_summary$coefficients["x", "Std. Error"],
"N.B. SE" = nb_summary$coefficients["x", "Std. Error"])
Sim..SE Poisson.SE N.B..SE
1 0.03492467 0.01273354 0.03984743
where you can see that the simulated std. deviation of the coefficient estimate is roughly 3x the Poisson-based estimate of same, and much better estimated by the NB-based estimate of same.
The coefficient estimates are pretty much the same, as one might expect with this sample size and the std. errors above, so I won't take up space by displaying them.
Additionally, although this is often a minor effect when overdispersion is low, by weighting the observations with more accurate weights (i.e., better estimates of observation-specific variances), the parameter estimates themselves will be more accurate, well, asymptotically at any rate. The rule of thumb I learned long ago was that heteroskedasticity adjustments (for that is what they are, in essence) don't buy you much unless the differences between weights are on the order of 5x or more.
Note, however, that in small samples you may well get more accurate (in MSE terms) point estimates with the Poisson model if there isn't much overdispersion, because you are reducing the variability induced by estimating the dispersion parameter. Of course, this is almost certainly more than offset by the loss of accuracy in the standard errors of the coefficient estimates.
This is almost a duplicate; the linked question explains that you shouldn't expect the coefficient estimates, residual deviance, nor degrees of freedom to change. The only thing that changes when moving from Poisson to quasi-Poisson is that a scale parameter that was previously fixed to 1 is computed from some estimate of residual variability/badness-of-fit (usually estimated via the sum of squares of the Pearson residuals ($\chi^2$) divided by the residual df, although asymptotically using the residual deviance gives the same result). The result is that the standard errors are scaled by the square root of this scale parameter, with concomitant changes in the confidence intervals and $p$-values.
The benefit of quasi-likelihood is that it fixes the basic fallacy of assuming that the data are Poisson (= homogeneous, independent counts); however, fixing the problem in this way potentially masks other issues with the data. (See below.) Quasi-likelihood is one way of handling overdispersion; if you don't address overdispersion in some way, your coefficients will be reasonable but your inference (CIs, $p$-values, etc.) will be garbage.
- As you comment above, there are lots of different approaches to overdispersion (Tweedie, different negative binomial parameterizations, quasi-likelihood, zero-inflation/alteration).
- With an overdispersion factor of >5 (8.4), I would worry a bit about whether it is being driven by some kind of model mis-fit (outliers, zero-inflation [which I see you've already tried], nonlinearity) rather than representing across-the-board heterogeneity. My general approach to this is graphical exploration of the raw data and regression diagnostics ...
Best Answer
In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.
(1) Less flexible means more efficient estimates. Given that variance parameters tend to be less stable than mean parameters, your assumption of fixed mean-variance relation may stabilize standard errors more.
(2) Model checking. I've worked with physicists who believe that various measurements can be described by Poisson distributions due to theoretical physics. If we reject the hypothesis that mean = variance, we have evidence against the Poisson distribution hypothesis. As pointed out in a comment by @GordonSmyth, if you have reason to believe that a given measurement should follow a Poisson distribution, if you have evidence of over dispersion, you have evidence that you are missing important factors.
(2.5) Proper distribution. While the negative binomial regression comes from a valid statistical distribution, it's my understanding that the Quasi-Poisson does not. That means you can't really simulate count data if you believe $Var[y] = \alpha E[y]$ for $\alpha \neq 1$. That might be annoying for some use cases. Likewise, you can't use probabilities to test for outliers, etc.