Solved – Diagnostics for generalized additive model vs linear model

cooks-distancediagnosticgeneralized-additive-modellinearmultiple regression

I am doing an analysis in R and I have the model:

lm(birthrate~education+employment+lo(latitude, longitude),data=data2).

It seems to be a generalized additive model, can I still use the usual diagnostic plots (the four default ones are residuals-vs-fitted, normal-QQ, scale-location, residuals-vs-leverage with Cook's-distance contours marked on) to assess the model's validity?

Best Answer

You can -- the plot method works with GAMs, and the interpretation is essentially the same.

The diagnostic plots relate to model assumptions under which inference might be performed; you don't necessarily need them all to hold if you're just trying to get a reasonable description of the relationships, say. However they can still be useful:

  • the residuals vs fitted would show pattern if you were oversmoothing

  • if the first plot is okay, then the scale-location plot would show changing error variance (heteroskedasticity). If there's strong effects the misspecified model would result in (at best) over-weighting the noisier parts and underweighting the places where the noise is less.

  • if both of those are okay, the normal scores plot might be useful because a misspecified model for the error term would suggest there are more efficient ways to estimate (and critically, any attempt to produce prediction intervals based on a normal assumption would perform poorly).

  • the residuals-leverage plot will still show points that are overly-influential on their own fitted values, this can be useful (for example you might want to see how much the fit changes if they're not there to see if the fit relies too much on only a point or two)