GLM tests involving deviance and likelihood ratios

deviancegeneralized linear modellikelihood-ratio

I'm a little confused about the different common tests for GLMs.

There is the null deviance, which is similar to a likelihood ratio for the difference between the saturated model and the model with only an intercept.

There is the residual deviance, which is similar to a likelihood ratio for the difference between the saturated model and the current fitted model.

Given these,

  1. It seems to me that you can use the difference in residual and null deviance, which should follow a Chi-square distribution, to give something analogous to the F-test in regression, is this correct?

  2. You can test the residual deviance itself, as it allows you to determine goodness of fit?

  3. I'm assuming there is no test involving the null deviance alone

  4. Unrelated to deviances, you can do likelihood ratio tests comparing for example a model with all parameters and a model lacking one of them?

R just outputs the null deviance and the residual deviance along with their degrees of freedom, so I assume the point is that you can use them to do the tests mentioned above.

If someone can elaborate (on a basic level) I would really appreciate it, because I'm really getting confused as to what the GLM tests make use of in terms of deviance etc.. And if I missed any tests please let me know as well.

Best Answer

The confusion probably comes from the fact that there are three models involved, and the term "deviance" refers to twice the log or the likelihood ratio between two of them. The models are:

  1. Null model (usually a model with only an intercept term, no influence of explanatory variables on response),
  2. GLM of interest, modelling the response by a linear combination of the explanatory variables (connected by the link function), and
  3. saturated model, in which the expected value of the response can freely depend on the values taken by the explanatory variables.

The residual deviance $D$ is defined as twice the log of the likelihood ratio between the saturated model and the GLM. The null deviance $D_0$ is twice the log of the likelihood ratio between the saturated model and the null model. From this it follows that $D-D_0$ is twice the log of the likelihood ratio between the GLM and the null model, and in fact you can compare any two models of different complexity nested in each other (i.e., where all parameters/explanatory variables of the less complex model also occur in the more complex model) by using the difference of their deviances (note that the log of the likelihood ratio is in fact a difference between log-likelihoods, and this means that if you compute a difference between deviances, the terms belonging to the saturated model cancel out).

All these statistics as logs of likelihood ratios are $\chi^2$-distributed under standard assumptions, with degrees of freedom as the difference between the numbers of parameters of the involved models.

To your questions:

  1. Yes, this tests the null hypothesis that there is no influence of the explanatory variables at all.
  2. Yes, this compares the fitted GLM with the saturated model (i.e., a model that is maximally flexible to fit the response from the explanatory variables).
  3. In principle you could have such a test, but this doesn't involve the GLM to be fitted, and is therefore not normally of interest when fitting a GLM, and therefore not usually taught (I wouldn't say that this is never of interest; it may well, in exceptional situations).
  4. The deviance is a log of the likelihood ratio, therefore tests based on deviances are in fact likelihood ratio tests, and all such likelihood ratio tests can be written as tests using the difference of deviances for different models (see above).
Related Question