Solved – null hypothesis in the deviance goodness of fit test for a GLM model

deviancegeneralized linear modelgoodness of fit

To test the goodness of fit of a GLM model, we use the Deviance goodness of fit test (to compare the model with the saturated model).

In many resource, they state that the null hypothesis is that "The model fits well" without saying anything more specifically (with mathematical formulation) what does it mean by "The model fits well".

Could you please tell me what is the mathematical form of the Null hypothesis in the Deviance goodness of fit test of a GLM model ?

Best Answer

The other answer is not correct. The test of the model's deviance against the null deviance is not the test of the model against the saturated model. It is the test of the model against the null model, which is quite a different thing (with a different null hypothesis, etc.).

The test of the fitted model against a model with only an intercept is the test of the model as a whole. This test is based on the difference between the model's deviance and the null deviance, with the degrees of freedom equal to the difference between the model's residual degrees of freedom and the null model's residual degrees of freedom (see my answer here: Test GLM model using null and model deviances). It is a test of whether the model contains any information about the response anywhere. In general, when there is only one variable in the model, this test would be equivalent to the test of the included variable. (For a GLM, there is an added complication that the types of tests used can differ, and thus yield slightly different p-values; see my answer here: Why do my p-values differ between logistic regression output, chi-squared test, and the confidence interval for the OR?)

The goodness of fit / lack of fit test for a fitted model is the test of the model against a model that has one fitted parameter for every data point (and thus always fits the data perfectly). It is based on the difference between the saturated model's deviance and the model's residual deviance, with the degrees of freedom equal to the difference between the saturated model's residual degrees of freedom and the model's residual degrees of freedom. For logistic regression models, the saturated model will always have $0$ residual deviance and $0$ residual degrees of freedom (see here). Thus, you could skip fitting such a model and just test the model's residual deviance using the model's residual degrees of freedom.

To answer this thread's explicit question: The null hypothesis of the lack of fit test is that the fitted model fits the data as well as the saturated model. That is, there is no remaining information in the data, just noise. While we usually want to reject the null hypothesis, in this case, we want to fail to reject the null hypothesis.

To explore these ideas, let's use the data from my answer to How to use boxplots to find the point where values are more likely to come from different conditions?

Cond.1 = c(2.9, 3.0, 3.1, 3.1, 3.1, 3.3, 3.3, 3.4, 3.4, 3.4, 3.5, 3.5, 3.6, 3.7, 3.7,
           3.8, 3.8, 3.8, 3.8, 3.9, 4.0, 4.0, 4.1, 4.1, 4.2, 4.4, 4.5, 4.5, 4.5, 4.6,
           4.6, 4.6, 4.7, 4.8, 4.9, 4.9, 5.5, 5.5, 5.7)
Cond.2 = c(2.3, 2.4, 2.6, 3.1, 3.7, 3.7, 3.8, 4.0, 4.2, 4.8, 4.9, 5.5, 5.5, 5.5, 5.7,
           5.8, 5.9, 5.9, 6.0, 6.0, 6.1, 6.1, 6.3, 6.5, 6.7, 6.8, 6.9, 7.1, 7.1, 7.1,
           7.2, 7.2, 7.4, 7.5, 7.6, 7.6, 10, 10.1, 12.5)
dat    = stack(list(cond1=Cond.1, cond2=Cond.2))

For convenience, I will define two functions to conduct these two tests:

lr.null.mod.test = function(model, digits=3){
  n.dev = model$null.deviance
  m.dev = model$deviance
  n.df  = model$df.null
  r.df  = model$df.residual
  ch.sq = n.dev - m.dev
  df    = n.df - r.df
  p     = 1-pchisq(ch.sq, df)
  out   = c(chi.sq=formatC(ch.sq, digits=digits-2, format="f"), 
            df=formatC(df, digits=0, format="d"), 
            p.value=formatC(p, digits=digits, format="f"))
  return(noquote(out))
}
lr.sat.mod.test = function(model, digits=3){
  m.dev = model$deviance
  n.df  = model$df.null
  r.df  = model$df.residual
  ch.sq = m.dev - 0
  df    = length(model$y) - length(model$coefficients)
  p     = 1-pchisq(ch.sq, df)
  out   = c(chi.sq=formatC(ch.sq, digits=digits-2, format="f"), 
            df=formatC(df, digits=0, format="d"), 
            p.value=formatC(p, digits=digits, format="f"))
  return(noquote(out))
}

Let's fit several models: 1) a null model with only an intercept; 2) our primary model using x; 3) a saturated model with a unique variable for every datapoint; and 4) a model also including a squared function of x.

m.null = glm(ind~1,                  dat, family="binomial")
m.x    = glm(ind~values,             dat, family="binomial")
m.sat  = glm(ind~as.factor(1:78),    dat, family="binomial")
m.x.x2 = glm(ind~values+I(values^2), dat, family="binomial")

Now let's look at some abridged output for these models.

summary(m.null)
# ...
#     Null deviance: 108.13  on 77  degrees of freedom
# Residual deviance: 108.13  on 77  degrees of freedom
# AIC: 110.13
# ...
summary(m.sat)
# ...
# Deviance Residuals: 
#  [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
# [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
# [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
# [76]  0  0  0
# 
# Coefficients:
#                     Estimate Std. Error z value Pr(>|z|)
# (Intercept)       -2.657e+01  3.561e+05       0        1
# as.factor(1:78)2  -1.762e-14  5.036e+05       0        1
# ...
# as.factor(1:78)78  5.313e+01  5.036e+05       0        1
# 
# ...
#     Null deviance: 1.0813e+02  on 77  degrees of freedom
# Residual deviance: 4.5252e-10  on  0  degrees of freedom
# AIC: 156
# 
# Number of Fisher Scoring iterations: 25
summary(m.x)
# ...
# Deviance Residuals: 
#     Min       1Q   Median       3Q      Max  
# -1.5683  -0.7985  -0.2407   0.7207   2.3069  
# 
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept)  -4.9375     1.1616  -4.251 2.13e-05 ***
# values        1.0213     0.2412   4.235 2.28e-05 ***
# ...
#     Null deviance: 108.131  on 77  degrees of freedom
# Residual deviance:  77.576  on 76  degrees of freedom
# AIC: 81.576
# ...

We see that the fitted model's reported null deviance equals the reported deviance from the null model, and that the saturated model's residual deviance is $0$ (up to rounding error arising from the fact that computers cannot carry out infinite precision arithmetic). Let's conduct our tests as defined above, and nested model tests of the actual models.

lr.null.mod.test(m.x)
#  chi.sq      df p.value 
#    30.6       1   0.000 
anova(m.null, m.x, test="LRT")
# Analysis of Deviance Table
# 
# Model 1: ind ~ 1
# Model 2: ind ~ values
#   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
# 1        77    108.131                          
# 2        76     77.576  1   30.555 3.246e-08 ***
lr.sat.mod.test(m.x)
#  chi.sq      df p.value 
#    77.6      76   0.428 
anova(m.x, m.sat, test="LRT")
# Analysis of Deviance Table
# 
# Model 1: ind ~ values
# Model 2: ind ~ as.factor(1:78)
#   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
# 1        76     77.576                     
# 2         0      0.000 76   77.576   0.4282

We can see that the results are the same. We also see that the lack of fit test was not significant. Many people will interpret this as showing that the fitted model is correct and has extracted all the information in the data. In fact, this is a dicey assumption, and is a problem with such tests. It amounts to assuming that the null hypothesis has been confirmed. As discussed in my answer to: Why do statisticians say a non-significant result means “you can't reject the null” as opposed to accepting the null hypothesis?, this assumption is invalid. We can see the problem, if we explore the last model fitted, and conduct its lack of fit test as well. It fits better than our initial model, despite our initial model 'passed' its lack of fit test. (In fact, one could almost argue that this model fits 'too well'; see here.)

summary(m.x.x2)
# ...
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept)  18.0825     6.1130   2.958 0.003096 ** 
# values       -9.8912     2.9649  -3.336 0.000850 ***
# I(values^2)   1.2220     0.3449   3.543 0.000396 ***
# ...
# 
#     Null deviance: 108.131  on 77  degrees of freedom
# Residual deviance:  56.884  on 75  degrees of freedom
# AIC: 62.884
# ...
anova(m.x, m.x.x2, test="LRT")
# Analysis of Deviance Table
# 
# Model 1: ind ~ values
# Model 2: ind ~ values + I(values^2)
#   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
# 1        76     77.576                          
# 2        75     56.884  1   20.692 5.394e-06 ***
lr.sat.mod.test(m.x.x2)
#  chi.sq      df p.value 
#    56.9      75   0.941

Related Solutions

Solved – Null deviance in glm R

Thanks to @jbaums.

y = c(2,3,6,7,8,9,10,12,15)
x = c(-1, -1, 0, 0, 0, 0, 1, 1, 1)

For the no intercept null model we have: $$ Y \sim {\rm Poisson}(1) $$ leading to the following null model likelihood term: $$ \mathcal L(y) = \exp(-1)/y! $$ or loglikelihood term: $$ l(y) = -1 - \log(y!) $$ The log-likehood for the null model is then the sum of these terms:

lnull = sum(-1 - log(factorial(y)))

As usual the log-likelhood for the saturated model is:

lf = sum(y * log(y) - y - log(factorial(y)))

so the null deviance is:

2*(lf - lnull) 
# [1] 191.8602

Solved – Why does the glm residual deviance have a chi-squared asymptotic null distribution

Your original question was rather cryptic, but I will assume that you are referring to the total residual deviance that is computed when you fit a generalized linear model.

Your question alludes to a widespread misconception. Regardless of what you might have read, the residual deviance from a generalized linear model is not asymptotically chi-square distributed. Differences in deviances used to test nested hypotheses usually do follow a scaled chi-square distribution asymptotically, but the residual deviance itself does not.

There are in fact conditions under which the residual deviance can be shown to be chisquare, but these depend on "small dispersion" asymptotics rather than large n asymptotics. Essentially, these conditions require that each individual observation becomes informative rather than just that there are many observations.

In practice, there are two main special cases in which the glm residual deviance follows a chisquare distribution. One is Poisson regression when all the fitted values are reasonably large, say more 2 or 3. The other is binomial regression. For binomial regression, one needs that all the $np$ and $n(1-p)$ values are greater than about 2 or 3. In other words, $n$ should be reasonably large and none of the probabilities should be too close to 0 or 1. Negative binomial glm can also produce chisquare residual deviances but in this case the NB mean and size parameters both have to be reasonably large.

There are other cases when the residual deviance follows a scaled chisquare distribution, i.e., a chisquare distribution multiplied by an unknown dispersion parameter. This applies for normal and inverse Gaussian glms, or for gamma glms when the shape parameter not too small. In some rare cases the dispersion parameter can be known, so a chisquare residual deviance can arise after dividing out the dispersion.

These results are derived in Section 5.4 of my recent textbook with Peter Dunn (Dunn and Smyth; 2018).

Reference

Dunn, PK, and Smyth, GK (2018). Generalized linear models with examples in R. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0118-7

Best Answer

Related Solutions

Solved – Null deviance in glm R

Solved – Why does the glm residual deviance have a chi-squared asymptotic null distribution

Related Question