Solved – Why are Pearsons residuals from a Poisson regression so large

poisson-regressionrresiduals

As I understand it, Pearsons residuals are ordinary residuals expressed in standard deviations.

I ran this Poisson regression:

library(ggplot2)

glm_diamonds <- glm(price ~ carat, family = "poisson", data=diamonds)

I then saved the Pearsons residuals and fitted values from the model:

resid <- resid(glm_diamonds, type = "pearson")
fitted <- fitted(glm_diamonds)
df <- data.frame(resid, fitted)

I then plotted the Pearsons residuals against fitted values:

ggplot(df, aes(fitted, resid)) + geom_point() + ylab("Pearsons residuals") + xlab("Fitted values")

enter image description here

It can be seen in the plot that many of residuals are hundreds of units away from zero. If Pearsons residuals are standard deviations, why are some residuals hundreds of units away from zero? Or in other words, why don't the residuals range from about -3 to 3 if they are standard deviations?

Best Answer

The key point is that the standardization method for Pearson residuals is to divide the difference between observed values $y_i$ and the fitted Poisson mean $\hat\mu_i$ by the theoretical standard deviation implied by that fitted mean:

$$r_i=\frac{y_i - \hat\mu_i}{\sqrt{\hat\mu_i}}$$

So if the model is badly mis-specified the assumed relation $\operatorname{Var} \mu_i=\mu_i$ can be wildly inaccurate: you have over-dispersion as @probabilityislogic says; moreover the fitted means are much too large for high-carat stones, indicating the assumed linear relation between the log mean and carat is too simple.

Related Solutions

Solved – Confidence Interval for predictions for Poisson regression

To address Q1, lets start by making some data to play with:

lo.to.p <- function(lo){  # this function will convert log odds to probabilities
  o <- exp(lo)            # we get odds by exponentiating log odds
  p <- o/(o+1)            # we convert to probabilities
  return(p)
}

set.seed(90)                        # this makes the example reproducible
x   <- runif(100, min=0, max=100)   # I generate some x data from a uniform dist
lo  <- -.5 + .1*x                   # this is the linear predictor
p   <- lo.to.p(lo)                  # converting log odds to probabilities
y   <- rbinom(100, size=1, prob=p)  # generating observed y values
foo <- data.frame(x=x, y=y)

  # @Gavin's code:
mod     <- glm(y ~ x, data=foo, family=binomial)
preddat <- with(foo, data.frame(x=seq(min(x), max(x), length=100)))
preds   <- predict(mod, newdata=preddat, type="link", se.fit=TRUE)

Now, why not try to get predicted values and a confidence interval / band by just using the original data:

preds2  <- predict(mod, newdata=foo$x, type="link", se.fit=TRUE)

That throws an error, because predict() needs the newdata argument to get a data frame:

# Error in eval(predvars, data, env) : 
#   numeric 'envir' arg not of length one

So let's try with the original data as a data frame:

preds3  <- predict(mod, newdata=data.frame(x=foo$x), type="link", se.fit=TRUE)

That time it worked, so let's see what the output looks like (I used our lo.to.p() function to convert the output from predict to predicted probabilities as @Gavin suggested, note that you can also use predict with type="response" to do that automatically):

enter image description here

Using the original data frame yields a garbled mess. You can sort the data first, which works OK in this case, but generally is not as smooth / pretty. To better show the effect of this strategy, I slightly augmented the data and model. Here's the code for the sorted version:

foo2    <- with(foo, data.frame(x=c(x, -100), y=c(y,0)))
mod2    <- glm(y~x, data=foo2, family=binomial)
preds4  <- predict(mod2, newdata=data.frame(x=sort(foo2$x)), type="link", 
                   se.fit=TRUE)

Regarding Q2, the statistical theory behind generalized linear models (GLiMs) assumes that the sampling distribution of a parameter estimate is asymptotically normally distributed (i.e., 'at infinity'). It is well known that this is not necessarily true for small samples, but the sampling distribution may be 'normal enough'. At any rate, this is (possibly) true on the scale of the linear predictor, which I call lo above; but the link function is a non-linear transformation, it isn't necessarily true on the response scale. To use an easy example, the normal distribution goes to infinity on both sides, but the response scale is bounded at 0 and 1. Moreover, all of these points hold for the Poisson distribution just like the binomial. Although it's not exactly the same topic, it may help to read my answer here: difference between logit and probit models because it provides a lot of information about link functions and GLiMs that may help with the larger conceptual framework.

For Q3, yes there is a relationship between the SEs of your coefficients and the width confidence band, but the confidence band is a little more complicated. The width of the confidence band grows as you move left or right away from the mean of x. (You can get the general idea from my answer here: linear regression prediction interval.) On the other hand, with a GLiM, the width of the confidence band also depends on the predicted value. To more easily see these effects, we can look at the confidence band for our original model on the scale of the linear predictor, and for a second model where there is no effect of x. Here's the second model:

y2      <- rbinom(100, size=1, prob=.5)
mod2    <- glm(y2~x, family=binomial)
preds5  <- predict(mod2, newdata=data.frame(x=sort(foo$x)), type="link", 
                   se.fit=TRUE)

Here's what they look like:

enter image description here

Poisson Regression – Residuals in Poisson Regression

The distinction is clear as soon as you understand what a Pearson residual is.

You are correct that for a Poisson model, variance increases as mean increases.

As a result, ordinary raw residuals ($r_i=y_i-\hat\mu_i$) should have a spread that increases with fitted values (though not in proportion).

However, Pearson residuals are residuals divided by the square root of the variance according to the model ($r^P_i=\frac{y_i-\hat\mu_i}{\sqrt{\hat\mu_i}}$ for a Poisson model). This means that if the model is correct, the Pearson residuals should have constant spread.

Residual plots from a simple simulated Poisson regression model. Left plot: raw residuals vs fitted mean show increasing spread with mean. There is diagonal "banding" in the residuals because the data are discrete. Right plot: Pearson residuals show what looks like constant spread as mean changes, and the diagonal bands are now curved.

Best Answer

Related Solutions

Solved – Confidence Interval for predictions for Poisson regression

Poisson Regression – Residuals in Poisson Regression

Related Question