Solved – Why does the glm residual deviance have a chi-squared asymptotic null distribution

deviancegeneralized linear modelmaximum likelihood

For a generalized linear model, the residual deviance is often described as having asymptotically a chi-squared null distribution.
I read that it's the case, for example

http://thestatsgeek.com/2014/04/26/deviance-goodness-of-fit-test-for-poisson-regression/

but I can't figure out why.
Can you help with an explanation?

Best Answer

Your original question was rather cryptic, but I will assume that you are referring to the total residual deviance that is computed when you fit a generalized linear model.

Your question alludes to a widespread misconception. Regardless of what you might have read, the residual deviance from a generalized linear model is not asymptotically chi-square distributed. Differences in deviances used to test nested hypotheses usually do follow a scaled chi-square distribution asymptotically, but the residual deviance itself does not.

There are in fact conditions under which the residual deviance can be shown to be chisquare, but these depend on "small dispersion" asymptotics rather than large n asymptotics. Essentially, these conditions require that each individual observation becomes informative rather than just that there are many observations.

In practice, there are two main special cases in which the glm residual deviance follows a chisquare distribution. One is Poisson regression when all the fitted values are reasonably large, say more 2 or 3. The other is binomial regression. For binomial regression, one needs that all the $np$ and $n(1-p)$ values are greater than about 2 or 3. In other words, $n$ should be reasonably large and none of the probabilities should be too close to 0 or 1. Negative binomial glm can also produce chisquare residual deviances but in this case the NB mean and size parameters both have to be reasonably large.

There are other cases when the residual deviance follows a scaled chisquare distribution, i.e., a chisquare distribution multiplied by an unknown dispersion parameter. This applies for normal and inverse Gaussian glms, or for gamma glms when the shape parameter not too small. In some rare cases the dispersion parameter can be known, so a chisquare residual deviance can arise after dividing out the dispersion.

These results are derived in Section 5.4 of my recent textbook with Peter Dunn (Dunn and Smyth; 2018).

Reference

Dunn, PK, and Smyth, GK (2018). Generalized linear models with examples in R. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0118-7

Related Solutions

Solved – Null deviance in glm R

Thanks to @jbaums.

y = c(2,3,6,7,8,9,10,12,15)
x = c(-1, -1, 0, 0, 0, 0, 1, 1, 1)

For the no intercept null model we have: $$ Y \sim {\rm Poisson}(1) $$ leading to the following null model likelihood term: $$ \mathcal L(y) = \exp(-1)/y! $$ or loglikelihood term: $$ l(y) = -1 - \log(y!) $$ The log-likehood for the null model is then the sum of these terms:

lnull = sum(-1 - log(factorial(y)))

As usual the log-likelhood for the saturated model is:

lf = sum(y * log(y) - y - log(factorial(y)))

so the null deviance is:

2*(lf - lnull) 
# [1] 191.8602

Solved – Measure of “deviance” for zero-inflated Poisson or zero-inflated negative binomial

The deviance is a GLM concept, ZIP and ZINB models are not glms but are formulated as finite mixtures of distributions which are GLMs and therefore can be solved easily via EM algorithm.

These notes describe the theory of deviance concisely. If you read those notes you'll see the proof that the saturated model for the Poisson regression has log-likelihood

$$\ell(\lambda_s)= \sum_{i=1, \forall y_i\neq 0}^n \left[ y_ilog(y_i)-y_i -log(y_i!)\right]$$

which results from the plug-in estimates $y_i =\hat{\lambda}_i$.

I'll proceed now with the ZIP likelihood because the math is simpler, similar results hold for the ZINB. Unfortunately for the ZIP, there is no simple relationship like in the Poisson. The $i$th observations log-likelihood is

$$\ell_i(\phi, \lambda)=Z_ilog(\phi+(1-\phi)e^{-\lambda})+ (1-Z_i)\left[-\lambda +y_ilog(\lambda) -log(y_i!)\right].$$

the $Z_i$ are not observed so to solve this you'd need to take partial derivatives w.r.t. both $\lambda$ and $\phi$, set the equations to 0 and then solve for $\lambda$ and $\phi$. The difficulty here are the $y_i=0$ values, these can go into a $\hat{\lambda}$ or into a $\hat{\phi}$ and it isn't possible without observing $Z_i$ which to put the $y_i=0$ observations into. However, if we knew the $Z_i$ value we wouldn't need a ZIP model because we would have no missing data. The observed data corresponds to the "complete data" likelihood in the EM formalism.

One approach that might be reasonable is to work with the expectation w.r.t. $Z_i$ of the complete data log-likelihood, $\mathbb{E}(\ell_i(\phi, \lambda))$ which removes the $Z_i$ and replaces with an expectation, this is part of what the EM algorithm calculates (the E step) with the most recent updates. I'm unaware of any literature that has studied this approach to $expected$ deviance though.

Also, this question was asked first so I answered this post. However, there is another question on the same topic with a nice comment by Gordon Smyth here: deviance for zero-inflated compound poisson model, continuous data (R) where he mentioned the same response (this is an elaboration of that comment I'd say) plus they mentioned in the comments to the other post a paper which you may want to read. (disclaimer, I have not read the paper referenced)

Best Answer

Related Solutions

Solved – Null deviance in glm R

Solved – Measure of “deviance” for zero-inflated Poisson or zero-inflated negative binomial

Related Question