Solved – Interpreting QQ plot of poisson regression

generalized linear modelpoisson-regressionqq-plot

This is the QQ plot resulting after fitting a poisson regression.

I found in a book saying that central line corresponds to zero cases in the response. I can imagine that for zero response cases standardized deviance residuals is negative. But how one can specifically say that the central line of the plot corresponds to zeros? What is the logic behind the argument

Best Answer

The line does not correspond to zeros. The Poisson distribution is for counts, which cannot go below $0$. You can see that there are points below the line. Instead, it is drawn through the middle of the distribution to give you a visual point (er, line) of reference. There are various algorithms for drawing the line, a common one is to draw a line connecting the first and third quartiles. I can't tell if that's what was done in your case.

At any rate, the qq-plot is constructed to help you assess if the residuals are normally distributed. But for a Poisson regression that doesn't make a lot of sense. So, I would probably ignore that plot.

Plotting to understand your model, and to check your assumptions is a very good thing to do, though. You can find some good ideas here: Diagnostic plots for count regression.

Related Solutions

Solved – Comparison negative binomial model and quasi-Poisson

I see the quasi-poisson as a technical fix; it allows you to estimate as an additional parameter $\phi$, the dispersion parameter. In the Poisson $\phi = 1$ by definition. If your data are not as or more dispersed than that, the standard errors of the model coefficients are biased. By estimating $\hat{\phi}$ at the same time as estimating the other model coefficients, you can provide a correction to the model standard errors, and hence their test statistics and associated $p$-values. This is just a correction to the model assumptions.

The negative binomial is a more direct model for the overdispersion; that the data generating process is or can be approximated by a negative-binomial.

The quasi-Poisson also introduces a whole pile of practical issues such as it not having a true likelihood hence the whole stack of useful things for model selection, like likelihood ratio test, AIC, etc... (I know there is something called QAIC, but R's glm() for example won't give you it).

Count Regression Analysis – Diagnostic Plots for Count Regression

Here is what I usually like doing (for illustration I use the overdispersed and not very easily modelled quine data of pupil's days absent from school from MASS):

Test and graph the original count data by plotting observed frequencies and fitted frequencies (see chapter 2 in Friendly) which is supported by the vcd package in R in large parts. For example, with goodfit and a rootogram:
```
library(MASS)
library(vcd)
data(quine) 
fit <- goodfit(quine$Days) 
summary(fit) 
rootogram(fit)
```
or with Ord plots which help in identifying which count data model is underlying (e.g., here the slope is positive and the intercept is positive which speaks for a negative binomial distribution):
```
Ord_plot(quine$Days)
```
or with the "XXXXXXness" plots where XXXXX is the distribution of choice, say Poissoness plot (which speaks against Poisson, try also type="nbinom"):
```
distplot(quine$Days, type="poisson")
```
Inspect usual goodness-of-fit measures (such as likelihood ratio statistics vs. a null model or similar):
```
mod1 <- glm(Days~Age+Sex, data=quine, family="poisson")
summary(mod1)
anova(mod1, test="Chisq")
```
Check for over / underdispersion by looking at residual deviance/df or at a formal test statistic (e.g., see this answer). Here we have clearly overdispersion:
```
library(AER)
deviance(mod1)/mod1$df.residual
dispersiontest(mod1)
```
Check for influential and leverage points, e.g., with the influencePlot in the car package. Of course here many points are highly influential because Poisson is a bad model:
```
library(car)
influencePlot(mod1)
```
Check for zero inflation by fitting a count data model and its zeroinflated / hurdle counterpart and compare them (usually with AIC). Here a zero inflated model would fit better than the simple Poisson (again probably due to overdispersion):
```
library(pscl)
mod2 <- zeroinfl(Days~Age+Sex, data=quine, dist="poisson")
AIC(mod1, mod2)
```
Plot the residuals (raw, deviance or scaled) on the y-axis vs. the (log) predicted values (or the linear predictor) on the x-axis. Here we see some very large residuals and a substantial deviance of the deviance residuals from the normal (speaking against the Poisson; Edit: @FlorianHartig's answer suggests that normality of these residuals is not to be expected so this is not a conclusive clue):
```
res <- residuals(mod1, type="deviance")
plot(log(predict(mod1)), res)
abline(h=0, lty=2)
qqnorm(res)
qqline(res)
```
If interested, plot a half normal probability plot of residuals by plotting ordered absolute residuals vs. expected normal values Atkinson (1981). A special feature would be to simulate a reference ‘line’ and envelope with simulated / bootstrapped confidence intervals (not shown though):
```
library(faraway)
halfnorm(residuals(mod1))
```
Diagnostic plots for log linear models for count data (see chapters 7.2 and 7.7 in Friendly's book). Plot predicted vs. observed values perhaps with some interval estimate (I did just for the age groups--here we see again that we are pretty far off with our estimates due to the overdispersion apart, perhaps, in group F3. The pink points are the point prediction $\pm$ one standard error):
```
plot(Days~Age, data=quine) 
prs  <- predict(mod1, type="response", se.fit=TRUE)
pris <- data.frame("pest"=prs[[1]], "lwr"=prs[[1]]-prs[[2]], "upr"=prs[[1]]+prs[[2]])
points(pris$pest ~ quine$Age, col="red")
points(pris$lwr  ~ quine$Age, col="pink", pch=19)
points(pris$upr  ~ quine$Age, col="pink", pch=19)
```

This should give you much of the useful information about your analysis and most steps work for all standard count data distributions (e.g., Poisson, Negative Binomial, COM Poisson, Power Laws).

Best Answer

Related Solutions

Solved – Comparison negative binomial model and quasi-Poisson

Count Regression Analysis – Diagnostic Plots for Count Regression

Related Question