From my understanding, the deviance residual of a GLM model, when plotted against the fitted values, should give a scatterplot distributed with mean 0 and constant variance? Does this hold for any GLM family with any link function? I am mostly interested in Gamma GLM with the identity link function right now.
Solved – GLM diagnostics and Deviance residual
deviancegeneralized linear modelregressionresiduals
Related Solutions
Can I use GLM normal distribution with LOG link function on a DV that has already been log transformed?
Yes; if the assumptions are satisfied on that scale
Is the variance homogeneity test sufficient to justify using normal distribution?
Why would equality of variance imply normality?
Is the residual checking procedure correct to justify choosing the link function model?
You should beware of using both histograms and goodness of fit tests to check the suitability of your assumptions:
1) Beware using the histogram for assessing normality. (Also see here)
In short, depending on something as simple as a small change in your choice of binwidth, or even just the location of the bin boundary, it's possible to get quite different impresssions of the shape of the data:
That's two histograms of the same data set. Using several different binwidths can be useful in seeing whether the impression is sensitive to that.
2) Beware using goodness of fit tests for concluding that the assumption of normality is reasonable. Formal hypothesis tests don't really answer the right question.
e.g. see the links under item 2. here
About the variance, that was mentioned in some papers using similar datasets "because distributions had homogeneous variances a GLM with a Gaussian distribution was used". If this is not correct, how can I justify or decide the distribution?
In normal circumstances, the question isn't 'are my errors (or conditional distributions) normal?' - they won't be, we don't even need to check. A more relevant question is 'how badly does the degree of non-normality that's present impact my inferences?"
I suggest a kernel density estimate or normal QQplot (plot of residuals vs normal scores). If the distribution looks reasonably normal, you have little to worry about. In fact, even when it's clearly non-normal it still may not matter very much, depending on what you want to do (normal prediction intervals really will rely on normality, for example, but many other things will tend to work at large sample sizes)
Funnily enough, at large samples, normality becomes generally less and less crucial (apart from PIs as mentioned above), but your ability to reject normality becomes greater and greater.
Edit: the point about equality of variance is that really can impact your inferences, even at large sample sizes. But you probably shouldn't assess that by hypothesis tests either. Getting the variance assumption wrong is an issue whatever your assumed distribution.
I read that scaled deviance should be around N-p for the model for a good fit right?
When you fit a normal model it has a scale parameter, in which case your scaled deviance will be about N-p even if your distribution isn't normal.
in your opinion the normal distribution with log link is a good choice
In the continued absence of knowing what you're measuring or what you're using the inference for, I still can't judge whether to suggest another distribution for the GLM, nor how important normality might be to your inferences.
However, if your other assumptions are also reasonable (linearity and equality of variance should at least be checked and potential sources of dependence considered), then in most circumstances I'd be very comfortable doing things like using CIs and performing tests on coefficients or contrasts - there's only a very slight impression of skewness in those residuals, which, even if it's a real effect, should have no substantive impact on those kinds of inference.
In short, you should be fine.
(While another distribution and link function might do a little better in terms of fit, only in restricted circumstances would they be likely to also make more sense.)
Here is the situation as I understand it: you can compare the general goodness of fit test across different GLM models if the dispersion parameter $\phi$ is known with certainty for the models you are comparing. With $\phi$ I mean the exponential family's dispersion parameter as used in this link.
So for Poisson (count regression) and Binomial (logistic regression) we know that $\phi=1$ and can thus legitimately compare fits on exactly the same data but with a different link function (a poisson and a count). But we cannot compare fits on that data using link function for the gamma, inverse gamma, gaussian, etc.
The reason is that the deviance (which is a likelihood ratio between a fully fitted model and your own model) is a function of the difference in estimates between the two models. But to get to the scaled deviance (which is $\chi^2$ distributed and makes the models comparable) we need to divide by $\phi$ which we do not always know.
I know this sounds odd when you know about F-tests for significance in nested GLMs using Deviance (whether they be gaussian or not) but with nested models the variance is estimated from the data and it is allowed for nested models of the same type on the same data. I have to say I cannot pin down for sure why it is allowed to estimate the dispersion here (my only guess is that across different link functions you cannot really quantify the variance of your $\hat{\phi}$). So OK to compare nested models, no matter what the link function.
Edit: with regards to your comment of just using SSE (btw. if you are going that route I recommend RMSE (link,link)), you can always get the RMSE for any model and say that this seems to fit better than that in a RMSE sense (just like you could do it with the deviances above and ignore the theory). The problem is that unlike in an OLS usually the variance of the prediction in a GLM is not constant and changes with the covariates (and the confidence intervals are not necessarily symmetric). So in theory you could get a model with a RMSE that is way higher than another model but actually have a model that is much closer to the process that generated the data. So if you do compare RMSE on the same data set avoid sweeping statements that completely dismiss one versus the other. And if you do want to use it to make a selection you should show that RMSE is consistently indicative of a better model by using cross-validation and sampling in the data (i.e. show that a good RMSE in sample means good RMSE out of sample). Btw. Deviance is RSS in the case of OLS.
Best Answer
Deviance residuals will not in general have 0 mean; they don't for Gamma models.
However the mean deviance residual tends to be reasonably close to 0.
Here's an example of a residual plot from a simple identity link gamma fit (to simulated data for which the model was appropriate; in this case the shape parameter of the gamma was 3):
The plot on the left is a typical deviance residuals vs fitted type plot. The one on the right splits the fitted values into bins so we can use boxplots to help judge whether the spread is near constant; the 0 line is marked in red.
As you can see from the boxplots, judging from the IQR, the spread is pretty much constant (with some random variation at the right where there are few values), but the medians there are consistently below 0. We can see that (in this case) the deviance residuals appear to be close to symmetric.
The mean deviance residual for this model is -0.1126, (marked in blue) which is very close to where those marked medians are sitting. With such a big sample, this mean is many standard errors from 0, but the mean is still "near" 0 (in the sense that the standard deviation of the residuals is more than 5 times larger than 0.1126).
Based on simulations, it looks like (as long as n is large and the shape parameter is not too small) the average deviance residual for a Gamma will be about $-\frac{1}{3\alpha}$, where $\alpha$ is the common shape parameter for the gamma-distributed response. The relationship comes in fairly well by about $\alpha=2$, but much below that it tends to overestimate.
In summary: the mean deviance residual should be close to constant, with close to constant variance, but the mean of the deviance residuals should be "near" 0 rather than 0.