GLM tests involving deviance and likelihood ratios

deviancegeneralized linear modellikelihood-ratio

I'm a little confused about the different common tests for GLMs.

There is the null deviance, which is similar to a likelihood ratio for the difference between the saturated model and the model with only an intercept.

There is the residual deviance, which is similar to a likelihood ratio for the difference between the saturated model and the current fitted model.

Given these,

It seems to me that you can use the difference in residual and null deviance, which should follow a Chi-square distribution, to give something analogous to the F-test in regression, is this correct?
You can test the residual deviance itself, as it allows you to determine goodness of fit?
I'm assuming there is no test involving the null deviance alone
Unrelated to deviances, you can do likelihood ratio tests comparing for example a model with all parameters and a model lacking one of them?

R just outputs the null deviance and the residual deviance along with their degrees of freedom, so I assume the point is that you can use them to do the tests mentioned above.

If someone can elaborate (on a basic level) I would really appreciate it, because I'm really getting confused as to what the GLM tests make use of in terms of deviance etc.. And if I missed any tests please let me know as well.

Best Answer

The confusion probably comes from the fact that there are three models involved, and the term "deviance" refers to twice the log or the likelihood ratio between two of them. The models are:

Null model (usually a model with only an intercept term, no influence of explanatory variables on response),
GLM of interest, modelling the response by a linear combination of the explanatory variables (connected by the link function), and
saturated model, in which the expected value of the response can freely depend on the values taken by the explanatory variables.

The residual deviance $D$ is defined as twice the log of the likelihood ratio between the saturated model and the GLM. The null deviance $D_0$ is twice the log of the likelihood ratio between the saturated model and the null model. From this it follows that $D-D_0$ is twice the log of the likelihood ratio between the GLM and the null model, and in fact you can compare any two models of different complexity nested in each other (i.e., where all parameters/explanatory variables of the less complex model also occur in the more complex model) by using the difference of their deviances (note that the log of the likelihood ratio is in fact a difference between log-likelihoods, and this means that if you compute a difference between deviances, the terms belonging to the saturated model cancel out).

All these statistics as logs of likelihood ratios are $\chi^2$-distributed under standard assumptions, with degrees of freedom as the difference between the numbers of parameters of the involved models.

To your questions:

Yes, this tests the null hypothesis that there is no influence of the explanatory variables at all.
Yes, this compares the fitted GLM with the saturated model (i.e., a model that is maximally flexible to fit the response from the explanatory variables).
In principle you could have such a test, but this doesn't involve the GLM to be fitted, and is therefore not normally of interest when fitting a GLM, and therefore not usually taught (I wouldn't say that this is never of interest; it may well, in exceptional situations).
The deviance is a log of the likelihood ratio, therefore tests based on deviances are in fact likelihood ratio tests, and all such likelihood ratio tests can be written as tests using the difference of deviances for different models (see above).

Related Solutions

Solved – In a GLM, is the log likelihood of the saturated model always zero

If you really meant log-likelihood, then the answer is: it's not always zero.

For example, consider Poisson data: $y_i \sim \text{Poisson}(\mu_i), i = 1, \ldots, n$. The log-likelihood for $Y = (y_1, \ldots, y_n)$ is given by: $$\ell(\mu; Y) = -\sum_{i = 1}^n \mu_i + \sum_{i = 1}^n y_i \log \mu_i - \sum_{i = 1}^n \log(y_i!). \tag{$*$}$$

Differentiate $\ell(\mu; Y)$ in $(*)$ with respect to $\mu_i$ and set it to $0$ (this is how we obtain the MLE for saturated model): $$-1 + \frac{y_i}{\mu_i} = 0.$$ Solve this for $\mu_i$ to get $\hat{\mu}_i = y_i$, substituting $\hat{\mu}_i$ back into $(*)$ for $\mu_i$ gives that the log-likelihood of the saturated model is: $$\ell(\hat{\mu}; Y) = \sum_{i = 1}^n y_i(\log y_i - 1) -\sum_{i = 1}^n \log(y_i!) \neq 0$$ unless $y_i$ take very special values.

In the help page of the R function glm, under the item deviance, the document explains this issue as follows:

deviance up to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.

Notice that it mentioned that the deviance, instead of the log-likelihood of the saturated model is chosen to be zero.

Probably, what you really wanted to confirm is that "the deviance of the saturated model is always given as zero", which is true, since the deviance, by definition (see Section 4.5.1 of Categorical Data Analysis (2nd Edition) by Alan Agresti) is the likelihood ratio statistic of a specified GLM to the saturated model. The constant aforementioned in the R documentation is actually twice the maximized log-likelihood of the saturated model.

Regarding your statement "Yet, the way the formula for deviance is given suggests that sometimes this quantity is non zero.", it is probably due to the abuse of usage of the term deviance. For instance, in R, the likelihood ratio statistic of comparing two arbitrary (nested) models $M_1$ and $M_2$ is also referred to as deviance, which would be more precisely termed as the difference between the deviance of $M_1$ and the deviance of $M_2$, if we closely followed the definition as given in Agresti's book.

Conclusion

The log-likelihood of the saturated model is in general non-zero.
The deviance (in its original definition) of the saturated model is zero.
The deviance output from softwares (such as R) is in general non-zero as it actually means something else (the difference between deviances).

The following are the derivation for the general exponential-family case and another concrete example. Suppose that data come from exponential family (see Modern Applied Statistics with S, Chapter $7$): $$f(y_i; \theta_i, \varphi) = \exp[A_i(y_i\theta_i - \gamma(\theta_i))/\varphi + \tau(y_i, \varphi/A_i)]. \tag{1}$$ where $A_i$ are known prior weights and $\varphi$ are dispersion/scale parameter (for many cases such as binomial and Poisson, this parameter is known, while for other cases such as normal and Gamma, this parameter is unknown). Then the log-likelihood is given by: $$\ell(\theta, \varphi; Y) = \sum_{i = 1}^n A_i(y_i \theta_i - \gamma(\theta_i))/\varphi + \sum_{i = 1}^n \tau(y_i, \varphi/A_i). $$ As in the Poisson example, the saturated model's parameters can be estimated by solving the following score function: $$0 = U(\theta_i) = \frac{\partial \ell(\theta, \varphi; Y)}{\partial \theta_i} = \frac{A_i(y_i - \gamma'(\theta_i))}{\varphi}$$

Denote the solution of the above equation by $\hat{\theta}_i$, then the general form of the log-likelihood of the saturated model (treat the scale parameter as constant) is: $$\ell(\hat{\theta}, \varphi; Y) = \sum_{i = 1}^n A_i(y_i \hat{\theta}_i - \gamma(\hat{\theta}_i))/\varphi + \sum_{i = 1}^n \tau(y_i, \varphi/A_i). \tag{$**$}$$

In my previous answer, I incorrectly stated that the first term on the right side of $(**)$ is always zero, the above Poisson data example proves it is wrong. For a more complicated example, consider the Gamma distribution $\Gamma(\alpha, \beta)$ given in the appendix.

Proof of the first term in the log-likelihood of saturated Gamma model is non-zero: Given $$f(y; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}e^{-\beta y}y^{\alpha - 1}, \quad y > 0, \alpha > 0, \beta > 0,$$ we must do reparameterization first so that $f$ has the exponential family form $(1)$. It can be verified if letting $$\varphi = \frac{1}{\alpha},\, \theta = -\frac{\beta}{\alpha},$$ then $f$ has the representation: $$f(y; \theta, \varphi) = \exp\left[\frac{\theta y - (-\log(-\theta))}{\varphi}+ \tau(y, \varphi)\right],$$ where $$\tau(y, \varphi) = -\frac{\log \varphi}{\varphi} + \left(\frac{1}{\varphi} - 1\right)\log y - \log\Gamma(\varphi^{-1}).$$ Therefore, the MLEs of the saturated model are $\hat{\theta}_i = -\frac{1}{y_i}$. Hence $$\sum_{i = 1}^n \frac{1}{\varphi}[\hat{\theta}_iy_i - (-\log(-\hat{\theta}_i))] = \sum_{i = 1}^n \frac{1}{\varphi}[-1 - \log(y_i)] \neq 0, $$ unless $y_i$ take very special values.

Solved – Comparing models using the deviance and log-likelihood ratio tests

The residual deviance is twice the difference between the likelihood in the log scale of the saturated model and that of your proposed model: $$ResidualDeviance=2\times(ll(SaturatedModel)-ll(Proposed Model)) $$ It can not be calculated simply as -2*logLik(model) in R generally, because the likelihood in the log scale of the saturated model is not always $0$. Read this post for mathematical evidence. -2*logLik(model) works for the logistic regression because in this case the likelihood in the log scale of the saturated model is $0$. To calculate the residual deviance of the negative binomial regression model manually in R, you can try this:

sum(residuals.glm(m1, "deviance")^2)

You are right about the likelihood that adding parameters will always increase the likelihood of a GLM. It is just a matter of statistical significance. It is recommended to choose a model based on the AIC and the BIC rather than the deviance only because the AIC and the BIC penalize you for adding more parameters.

I hope it will help.

Best Answer

Related Solutions

Solved – In a GLM, is the log likelihood of the saturated model always zero

Solved – Comparing models using the deviance and log-likelihood ratio tests

Related Question