Solved – Why does poisson regression need to assume observations are poisson distributed

assumptionsgeneralized linear modelpoisson-regressionresiduals

Zuur (2013) 'Beginners Guide to GLM and GLMM' states that if the Pearson residuals, when plotted against fitted values from a poisson regression, show the pattern below then the assumption of poisson distributed observations is violated. Zuur urges to try a negative binomial regression if the residuals show a pattern like below. If a poisson regression is used, then Zuur states the standard errors will be wrong. But why does the model need to make assumptions about the distribution of observations around the fitted values? Why can't it use information contained in the distribution of the observed residuals to calculate the standard errors? My best guess is that for every fitted values, there might only be a few observed values, and there is not enough information contained in these few observed values to calculate the standard errors.

enter image description here

Best Answer

A) "If a poisson regression is used, then Zuur states the standard errors will be wrong." This is similar to the what happens in a linear model if the errors are heteroskedastic: the variance of $\hat{\boldsymbol{\beta}}_{OLS}$ is $\sigma^2(\mathbf{X'X})^{-1}$, where $\sigma^2$ is the supposed homogeneous variance, but if the variance is not homogeneous an adjustment has to be made to the standard errors. The Poisson model restricts the variance to equal the mean (equidispersion). If the variance exceeds the mean (overdispersion), the Poisson standard errors are wrong because their values depend on the estimated variance: if variance is wrong, standard errors are wrong. As you know, if data are heteroskedastic and you apply OLS, you can (should) throw away the wrong standard errors and prefer heteroskedasticity-consistent standard errors, also called White or sandwich standard errors. Youl could compute sandwich standard errors in a Poisson regression too, but a better specified model would be... better (see Zeileis, Kleiber, and Jackman.)

B) "But why does the model need to make assumptions about the distribution of observations around the fitted values?" Every model depends on the assumption that the distribution of observations around the fitted values is optimal! Think about prediction: if the distribution of observations around the fitted values is far from optimal, predictions are unreliable.

C) "Why can't it use information contained in the distribution of the observed residuals to calculate the standard errors?" Standardized residuals are: $$z_i=\frac{y_i-\hat{y}_i}{\hat\sigma_{y_i}}$$ in a Poisson model $\hat\sigma_{y_i}=\sqrt{E(y_i)}$. If the model is true, then the $z_i$'s sould be approximately independent, each with standard deviation 1. If data are overdispersed, then the $z_i$'s will be larger in absolute value because of the extra variation beyond what is predicted under the Poisson model. In such cases standard errors are actually calculated using "information contained in the distribution of the observed residuals", but that distribution is wrong because your model is misspecified: you assume equidispersion but your assumption does not hold.

D) "My best guess is that for every fitted values, there might only be a few observed values, and there is not enough information contained in these few observed values to calculate the standard errors." What is wrong in your figure is that there are many residuals greater than 1 in absolute value, so they cannot be standardized residuals. More observed values may help, but a better specified model would be more effective. Look at Zeileis, Kleiber, and Jackman. They use a dataset with 4406 observations. Look at the fitted vs Person residuals plots for a Poisson and a negative binomial model:

enter image description here

Which one do you prefer?

However, while the negative binomial already improves the fit dramatically, it can in turn be improved by the hurdle and zero-inflated models. See Zeileis, Kleiber, and Jackman.

Related Question