Solved – Why are Pearsons residuals from a Poisson regression so large

poisson-regressionrresiduals

As I understand it, Pearsons residuals are ordinary residuals expressed in standard deviations.

I ran this Poisson regression:

library(ggplot2)

glm_diamonds <- glm(price ~ carat, family = "poisson", data=diamonds)

I then saved the Pearsons residuals and fitted values from the model:

resid <- resid(glm_diamonds, type = "pearson")
fitted <- fitted(glm_diamonds)
df <- data.frame(resid, fitted)

I then plotted the Pearsons residuals against fitted values:

ggplot(df, aes(fitted, resid)) + geom_point() + ylab("Pearsons residuals") + xlab("Fitted values")

enter image description here

It can be seen in the plot that many of residuals are hundreds of units away from zero. If Pearsons residuals are standard deviations, why are some residuals hundreds of units away from zero? Or in other words, why don't the residuals range from about -3 to 3 if they are standard deviations?

Best Answer

The key point is that the standardization method for Pearson residuals is to divide the difference between observed values $y_i$ and the fitted Poisson mean $\hat\mu_i$ by the theoretical standard deviation implied by that fitted mean:

$$r_i=\frac{y_i - \hat\mu_i}{\sqrt{\hat\mu_i}}$$

So if the model is badly mis-specified the assumed relation $\operatorname{Var} \mu_i=\mu_i$ can be wildly inaccurate: you have over-dispersion as @probabilityislogic says; moreover the fitted means are much too large for high-carat stones, indicating the assumed linear relation between the log mean and carat is too simple.