Solved – Interpreting plot of residuals vs. fitted values from Poisson regression

generalized linear modelpoisson-regressionrself-study

I am trying to fit data with a GLM (poisson regression) in R. When I plotted the residuals vs the fitted values, the plot created multiple (almost linear with a slight concave curve) "lines". What does this mean?

library(faraway)
modl <- glm(doctorco ~ sex + age + agesq + income + levyplus + freepoor + 
            freerepa + illness + actdays + hscore + chcond1 + chcond2,
            family=poisson, data=dvisits)
plot(modl)

enter image description here

Best Answer

This is the appearance you expect of such a plot when the dependent variable is discrete.

Each curvilinear trace of points on the plot corresponds to a fixed value $k$ of the dependent variable $y$. Every case where $y=k$ has a prediction $\hat{y}$; its residual--by definition--equals $k-\hat{y}$. The plot of $k-\hat{y}$ versus $\hat{y}$ is obviously a line with slope $-1$. In Poisson regression, the x-axis is shown on a log scale: it is $\log(\hat{y})$. The curves now bend down exponentially. As $k$ varies, these curves rise by integral amounts. Exponentiating them gives a set of quasi-parallel curves. (To prove this, the plot will be explicitly constructed below, separately coloring the points by the values of $y$.)

We can reproduce the plot in question quite closely by means of a similar but arbitrary model (using small random coefficients):

# Create random data for a random model.
set.seed(17)
n <- 2^12                       # Number of cases
k <- 12                         # Number of variables
beta = rnorm(k, sd=0.2)         # Model coefficients
x <- matrix(rnorm(n*k), ncol=k) # Independent values
y <- rpois(n, lambda=exp(-0.5 + x %*% beta + 0.1*rnorm(n)))

# Wrap the data into a data frame, create a formula, and run the model.
df <- data.frame(cbind(y,x))    
s.formula <- apply(matrix(1:k, nrow=1), 1, function(i) paste("V", i+1, sep=""))
s.formula <- paste("y ~", paste(s.formula, collapse="+"))
modl <- glm(as.formula(s.formula), family=poisson, data=df)

# Construct a residual vs. prediction plot.
b <- coefficients(modl)
y.hat <- x %*% b[-1] + b[1]     # *Logs* of the predicted values
y.res <- y - exp(y.hat)         # Residuals
colors <- 1:(max(y)+1)          # One color for each possible value of y
plot(y.hat, y.res, col=colors[y+1], main="Residuals v. Fitted")

Residuals vs. fitted

Related Question