Solved – Identical coefficients estimated in Poisson vs Quasi-Poisson model

count-dataoverdispersionpoisson-regressionquasi-likelihoodr

In modeling claim count data in an insurance environment, I began with Poisson but then noticed overdispersion. A Quasi-Poisson better modeled the greater mean-variance relationship than the basic Poisson, but I noticed that the coefficients were identical in both Poisson and Quasi-Poisson models.

If this isn't an error, why is this happening? What is the benefit of using Quasi-Poisson over Poisson?

Things to note:

  • The underlying losses are on an excess basis, which (I believe) prevented the Tweedie from working – but it was the first distribution I tried. I also examined NB, ZIP, ZINB, and Hurdle models, but still found the Quasi-Poisson provided the best fit.
  • I tested for overdispersion via dispersiontest in the AER
    package. My dispersion parameter was approximately 8.4, with p-value
    at the 10^-16 magnitude.
  • I am using glm() with family = poisson or quasipoisson and a log link
    for code.
  • When running the Poisson code, I come out
    with warnings of "In dpois(y, mu, log = TRUE) : non-integer x = …".

Helpful SE Threads per Ben's guidance:

  1. Basic Math of Offsets in Poisson regression
  2. Impact of Offsets on Coefficients
  3. Difference between using Exposure as Covariate vs Offset

Best Answer

This is almost a duplicate; the linked question explains that you shouldn't expect the coefficient estimates, residual deviance, nor degrees of freedom to change. The only thing that changes when moving from Poisson to quasi-Poisson is that a scale parameter that was previously fixed to 1 is computed from some estimate of residual variability/badness-of-fit (usually estimated via the sum of squares of the Pearson residuals ($\chi^2$) divided by the residual df, although asymptotically using the residual deviance gives the same result). The result is that the standard errors are scaled by the square root of this scale parameter, with concomitant changes in the confidence intervals and $p$-values.

The benefit of quasi-likelihood is that it fixes the basic fallacy of assuming that the data are Poisson (= homogeneous, independent counts); however, fixing the problem in this way potentially masks other issues with the data. (See below.) Quasi-likelihood is one way of handling overdispersion; if you don't address overdispersion in some way, your coefficients will be reasonable but your inference (CIs, $p$-values, etc.) will be garbage.

  • As you comment above, there are lots of different approaches to overdispersion (Tweedie, different negative binomial parameterizations, quasi-likelihood, zero-inflation/alteration).
  • With an overdispersion factor of >5 (8.4), I would worry a bit about whether it is being driven by some kind of model mis-fit (outliers, zero-inflation [which I see you've already tried], nonlinearity) rather than representing across-the-board heterogeneity. My general approach to this is graphical exploration of the raw data and regression diagnostics ...
Related Question