The quasi-Poisson is not a full maximum likelihood (ML) model but a quasi-ML model. You just use the estimating function (or score function) from the Poisson model to estimate the coefficients, and then employ a certain variance function to obtain suitable standard errors (or rather a full covariance matrix) to perform inference. Hence, glm()
does not supply and logLik()
or AIC()
here etc.
As you correctly point out, a model with the same expectation and variance function can be embedded into the negative binomial (NB) framework if the size
parameter $\theta_i$ varies along with the expectation $\mu_i$. In the literature, this is typically called the NB1 parametrization. See for example the Cameron & Trivedi book (Regression Analysis of Count Data) or "Analysis of Microdata" by Winkelmann & Boes.
If there are no regressors (just an intercept) the NB1 parametrization and the NB2 parametrization employed by MASS
's glm.nb()
coincide. With regressors they differ. In the statistical literature the NB2 parametrization is more frequently used but some software packages also offer the NB1 version. For example in R, you can use the gamlss
package to do gamlss(y ~ x, family = NBII)
. Note that somewhat confusingly gamlss
uses NBI
for the NB2 parametrization and NBII
for NB1. (But jargon and terminology is not unified across all communities.)
Then you could ask, of course, why use quasi-Poisson if there is NB1 available? There is still a subtle difference: The former uses quasi-ML and obtains the estimate from the dispersion from the squared deviance (or Pearson) residuals. The latter uses full ML. In practice, the difference often isn't large but the motivations for using either model are slightly different.
Poisson regression is just a GLM:
People often speak of the parametric rationale for applying Poisson regression. In fact, Poisson regression is just a GLM. That means Poisson regression is justified for any type of data (counts, ratings, exam scores, binary events, etc.) when two assumptions are met: 1) the log of the mean-outcome is a linear combination of the predictors and 2) the variance of the outcome is equal to the mean. These two conditions are respectively referred to as the mean-model and the mean-variance relationship.
The mean-model assumption can be relaxed somewhat by using a complex set of adjustments for predictors. This is nice because the link function affects the interpretation of the parameters; the subtlety of interpretation makes the difference between answering a scientific question and completely eluding the consumers of your statistical analysis. In another SE post I discuss the usefulness of log-transforms for interpretation.
It turns out, however, that the second assumption (mean-variance relationship) has strong implications on inference. When the mean-variance relationship is not true, the parameter estimates are not biased. However, the standard errors, confidence intervals, p-values, and predictions are all miscalibrated. That means you cannot control for Type I error and you may have suboptimal power.
What if the mean-variance could be relaxed so that the variance is simply proportional to the mean? Negative binomial regression and Quasipoisson regression do this.
Quasipoisson models
Quasipoisson models are not likelihood based. They maximize a "quasilikelihood" which is a Poisson likelihood up to a proportional constant. That proportional constant happens to be the dispersion. The dispersion is considered a nuisance parameter. While the maximization routine comes up with an estimate of the nuisance parameter, that estimate is merely an artifact of the data rather than any value which generalizes to the population. The dispersion only serves to "shrink" or "widen" the SEs of the regression parameters according to whether the variance is proportionally smaller than or larger than the mean. Since the dispersion is treated as a nuisance parameter, quasipoisson models enjoy a host of robust properties: the data can in fact be heteroscedastic (not meeting the proportional mean-variance assumption) and even exhibit small sources of dependence, and the mean model need not be exactly correct, but the 95% CIs for the regression parameters are asymptotically correct. If your goal of the data analysis is to measure the association between a set of regression parameters and the outcome, quasipoisson models are usually the way to go. A limitation of these models is that they cannot yield prediction intervals, the Pearson residuals cannot tell you much about how accurate the mean model is, and information criteria like the AIC or BIC cannot effectively compare these models to other types of models.
Negative binomial models
It's most useful to understand negative binomial regression as a 2-parameter Poisson regression. The mean model is the same as in Poisson and Quasipoisson models where the log of the outcome is a linear combination of predictors. Furthermore, the "scale" parameter models a mean-variance relationship where the variance is merely proportional to the mean as before. However, unlike quasipoisson models, this type of model is an exact likelihood based procedure. In this case the dispersion is an actual parameter which has some extent of generalizability to the population. This introduces a few advantages over quasipoisson but, in my opinion, imposes more (untestable) assumptions. Unlike quasipoisson models: the data must be independent, the mean model must be correct, and the scale parameter must be homoscedastic across the range of fitted values to obtain correct inference. However, these can be assessed somewhat by inspecting Pearson residuals, and the model produces viable prediction and prediction intervals, and is amenable to comparison with information criteria.
Negative binomial probability models arise from a Poisson-Gamma mixture. That is, there is an unknown fluctuating Gamma random variable "feeding into" the Poisson rate parameter. Since NB GLM fitting is likelihood based, it is usually helpful to state prior beliefs about the data generating mechanism and connect them to the probabilistic rationale for the model at hand. For instance, if I am testing number of racers retiring from 24-hour endurance racing, I might consider that the environmental conditions are all stressors that I did not measure and thus contribute to the risk of DNF, such as moisture or cold temperature affecting tire traction and thus the risk of a spin-out and wreck.
Models for dependent data: GLMMs vs GEE
Generalized linear mixed models (GLMMs) for Poisson data do not compare with the above approaches. GLMMs answer a different question and are used in different data structures. Here sources of dependence between data are measured explicitly. GLMMs make use of random intercepts and random slopes to account for individual level heterogeneity. This modifies what we estimate. The random effects modify the mean and the variance that is modeled rather than just the variance as was discussed above.
There are two possible levels of association which can be measured in dependent data: population level (marginal) and individual level (conditional). GLMMs claim to measure individual level (conditional) associations: that is, given the whole host of individual level contributors to the outcome, what is the relative effect of a combination of predictors. As an example, exam prep courses may be of little effect to children who attend exemplary schools, whereas inner city children may benefit tremendously. The individual level effect is then substantially higher in this circumstance since advantaged children are too far above the curve in terms of positive exposures.
If we naively applied quasipoisson or negative binomial models to dependent data, the NB models would be wrong, and the Quasipoisson models would be inefficient. The GEE, however, extends the quasipoisson model to explicitly model dependence structures like the GLMM, but the GEE measures an marginal (population level) trend and obtains the correct weights, standard errors, and inference.
Data analysis example:
This post is already too long :) There is a nice illustration of the first two models in this tutorial, along with references to more reading if you are interested. The data in question involve the nesting habits of horseshoe crabs: females sit in nests and males (satellites) attach to her. The investigators wanted to measures the number of males attached to a female as a function of the female's characteristics. I hope I've underscored why mixed models are noncomparable: if you have dependent data, you must use the correct model for the question those dependent data are trying to answer, either a GLM or a GEE.
References:
[1] Agresti, Categorical Data Analysis 2nd Edition
[2] Diggle, Heagerty, Liang, Zeger, Analysis of Longitudinal Data 2nd ed.
Best Answer
I see the quasi-poisson as a technical fix; it allows you to estimate as an additional parameter $\phi$, the dispersion parameter. In the Poisson $\phi = 1$ by definition. If your data are not as or more dispersed than that, the standard errors of the model coefficients are biased. By estimating $\hat{\phi}$ at the same time as estimating the other model coefficients, you can provide a correction to the model standard errors, and hence their test statistics and associated $p$-values. This is just a correction to the model assumptions.
The negative binomial is a more direct model for the overdispersion; that the data generating process is or can be approximated by a negative-binomial.
The quasi-Poisson also introduces a whole pile of practical issues such as it not having a true likelihood hence the whole stack of useful things for model selection, like likelihood ratio test, AIC, etc... (I know there is something called QAIC, but R's
glm()
for example won't give you it).