Thanks to @jbaums.
y = c(2,3,6,7,8,9,10,12,15)
x = c(-1, -1, 0, 0, 0, 0, 1, 1, 1)
For the no intercept null model we have:
$$
Y \sim {\rm Poisson}(1)
$$
leading to the following null model likelihood term:
$$
\mathcal L(y) = \exp(-1)/y!
$$
or loglikelihood term:
$$
l(y) = -1 - \log(y!)
$$
The log-likehood for the null model is then the sum of these terms:
lnull = sum(-1 - log(factorial(y)))
As usual the log-likelhood for the saturated model is:
lf = sum(y * log(y) - y - log(factorial(y)))
so the null deviance is:
2*(lf - lnull)
# [1] 191.8602
Your original question was rather cryptic, but I will assume that you are referring to the total residual deviance that is computed when you fit a generalized linear model.
Your question alludes to a widespread misconception. Regardless of what you might have read, the residual deviance from a generalized linear model is not asymptotically chi-square distributed. Differences in deviances used to test nested hypotheses usually do follow a scaled chi-square distribution asymptotically, but the residual deviance itself does not.
There are in fact conditions under which the residual deviance can be shown to be chisquare, but these depend on "small dispersion" asymptotics rather than large n asymptotics. Essentially, these conditions require that each individual observation becomes informative rather than just that there are many observations.
In practice, there are two main special cases in which the glm residual deviance follows a chisquare distribution. One is Poisson regression when all the fitted values are reasonably large, say more 2 or 3. The other is binomial regression. For binomial regression, one needs that all the $np$ and $n(1-p)$ values are greater than about 2 or 3. In other words, $n$ should be reasonably large and none of the probabilities should be too close to 0 or 1.
Negative binomial glm can also produce chisquare residual deviances but in this case the NB mean and size parameters both have to be reasonably large.
There are other cases when the residual deviance follows a scaled chisquare distribution, i.e., a chisquare distribution multiplied by an unknown dispersion parameter.
This applies for normal and inverse Gaussian glms, or for gamma glms when the shape parameter not too small.
In some rare cases the dispersion parameter can be known, so a chisquare residual deviance can arise after dividing out the dispersion.
These results are derived in Section 5.4 of my recent textbook with Peter Dunn (Dunn and Smyth; 2018).
Reference
Dunn, PK, and Smyth, GK (2018). Generalized linear models with examples in R. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0118-7
Best Answer
The other answer is not correct. The test of the model's deviance against the null deviance is not the test of the model against the saturated model. It is the test of the model against the null model, which is quite a different thing (with a different null hypothesis, etc.).
The test of the fitted model against a model with only an intercept is the test of the model as a whole. This test is based on the difference between the model's deviance and the null deviance, with the degrees of freedom equal to the difference between the model's residual degrees of freedom and the null model's residual degrees of freedom (see my answer here: Test GLM model using null and model deviances). It is a test of whether the model contains any information about the response anywhere. In general, when there is only one variable in the model, this test would be equivalent to the test of the included variable. (For a GLM, there is an added complication that the types of tests used can differ, and thus yield slightly different p-values; see my answer here: Why do my p-values differ between logistic regression output, chi-squared test, and the confidence interval for the OR?)
The goodness of fit / lack of fit test for a fitted model is the test of the model against a model that has one fitted parameter for every data point (and thus always fits the data perfectly). It is based on the difference between the saturated model's deviance and the model's residual deviance, with the degrees of freedom equal to the difference between the saturated model's residual degrees of freedom and the model's residual degrees of freedom. For logistic regression models, the saturated model will always have $0$ residual deviance and $0$ residual degrees of freedom (see here). Thus, you could skip fitting such a model and just test the model's residual deviance using the model's residual degrees of freedom.
To answer this thread's explicit question: The null hypothesis of the lack of fit test is that the fitted model fits the data as well as the saturated model. That is, there is no remaining information in the data, just noise. While we usually want to reject the null hypothesis, in this case, we want to fail to reject the null hypothesis.
To explore these ideas, let's use the data from my answer to How to use boxplots to find the point where values are more likely to come from different conditions?
For convenience, I will define two functions to conduct these two tests:
Let's fit several models: 1) a null model with only an intercept; 2) our primary model using
x
; 3) a saturated model with a unique variable for every datapoint; and 4) a model also including a squared function ofx
.Now let's look at some abridged output for these models.
We see that the fitted model's reported null deviance equals the reported deviance from the null model, and that the saturated model's residual deviance is $0$ (up to rounding error arising from the fact that computers cannot carry out infinite precision arithmetic). Let's conduct our tests as defined above, and nested model tests of the actual models.
We can see that the results are the same. We also see that the lack of fit test was not significant. Many people will interpret this as showing that the fitted model is correct and has extracted all the information in the data. In fact, this is a dicey assumption, and is a problem with such tests. It amounts to assuming that the null hypothesis has been confirmed. As discussed in my answer to: Why do statisticians say a non-significant result means “you can't reject the null” as opposed to accepting the null hypothesis?, this assumption is invalid. We can see the problem, if we explore the last model fitted, and conduct its lack of fit test as well. It fits better than our initial model, despite our initial model 'passed' its lack of fit test. (In fact, one could almost argue that this model fits 'too well'; see here.)