Solved – Interpreting Residual and Null Deviance in GLM (using R)

deviancegeneralized linear modelrregression

I am using a glm function for regression analysis.
And I have one question about interpreting residual / null deviance in GLM.

First, here is the result.

Call:  glm(formula = cbind(using, notUsing) ~ age + hiEduc + noMore, 
    family = binomial, data = cuse)

Coefficients:
(Intercept)     age25-29     age30-39     age40-49   hiEducTRUE   noMoreTRUE  
    -1.9662       0.3894       0.9086       1.1892       0.3250       0.8330  

Degrees of Freedom: 15 Total (i.e. Null);  10 Residual
Null Deviance:      165.8 
Residual Deviance: 29.92    AIC: 113.4

To see significance, I obtained the p-value as follows:

pchisq(29.92, 10, lower.tail = FALSE)
[1] 0.0008828339

Because this p-value is smaller than 0.05, is it OK to say that our model is appropriate?

Thank you in advance!!

Best Answer

Appropriate for what purpose? That is the question.

Since the residual deviance is significantly large, you can conclude that either your data is overdispersed or the model does not explain all the signal in the data.

On the other hand, the model does explains a deviance of $165.8 - 29.92 = 135.9$ on 5 degrees of freedom, which is most ($135.9 / 165.8 = 82$%) of what can be explained. So the model explains a lot but not quite everything.

You model is appropriate in that it should have genuine predictive power as it is. There is no requirement that a model must explain all the signal in order to be used for prediction or for interpretation. On the other hand, the model is not a complete fit to the data, so there remains the possibility that you could potentially improve the model by adding another appropriate term.

Another possibility is that your data is overdispersed, which corresponds to the possibility that the individual cases from which the counts are derived are positively correlated. The use of residual deviances to judge goodness of fit only makes sense if the cases truly are independent.