Solved – Why residual plots are used for diagnostic of glm

assumptionsgeneralized linear modelqq-plotresiduals

I asked this question here which @Glen_b had kindly answered. I thought I could get a more detailed explanation with references if I posted it as a whole new question.

So my questions is why residuals plots such as residual vs fitted plot and normal QQ normal can be used for diagnostic of glm? Residuals vs fitted are used for OLS to checked for heterogeneity of residuals and normal qq plot is used to check normality of residuals. However there is no such assumption for glm (e.g. gamma, poisson and negative binomial). So why are these plot still being used to diagnose glm? There are questions (1,2 and etc.) that discussed its usage but the did not explain the reason for their relevance. There is even a command glm.diag.plots from R package boot that provides residuals plots for glm.

Here are some plots from my current analysis. I am trying to select a model among the three: OLS, lognormal OLS and gamma with log link. Perhaps it will be easier to discuss using these plots as examples.

Linear model
enter image description here

lognormal linear model
enter image description here

Log link gamma glm
enter image description here

Additional plots for log link gamma glm

enter image description here

Best Answer

These plots should be used with caution with non-normal GLMs. My general recommendation is not to look at them if you aren't fitting an OLS regression model (see: Interpretation of plot (glm.model)). For example, why assess whether the residuals are normally distributed when they aren't necessarily supposed to be?

With respect to your specific plots / models, the predictions look pretty far off except for the OLS model. The OLS model seems to have some problems with heteroscedasticity and non-normality of the residuals. I tend to be somewhat skeptical of 'model selection', and I would advise against fitting a bunch of different models and selecting the one with the nicest looking plots. You should start with an understanding of your data and your situation. For instance, it looks like your response is never negative. What is it? Asking (and answering) questions like that should guide you towards the model you want to use.