Solved – the best measure for goodness of fit for GLM (i.e. the R^2 equivalent for GLM)

generalized linear modelgoodness of fit

R-sq is a very common measure on goodness of fit for OLM, and many people are aware of that even with very rudimentary statistical knowledge. Is there an equivalent of R-sq for GLM (in my case, binomial/logistic)?

Thank you.

Best Answer

For me the adj. R-square is only good for knowing whether or not your independent variables explain enough of your response. With a GLM, things are a bit more abstract since you don’t get an actual value to express this (or at least, I never learned how to do that).

However, there is the Likelihood-ratio test which shows you whether your model identifies with the null model (which basically is the equivalent of knowing whether your adj. R-sq. would be “too low” or not), and there is also the Deviance goodness of fit test which tells you whether your model identifies with the saturated model (which is the equivalent of knowing whether your adj. R-sq. value would be “high enough” or not).

PS: This post might be useful too.

Related Solutions

Solved – null hypothesis in the deviance goodness of fit test for a GLM model

The other answer is not correct. The test of the model's deviance against the null deviance is not the test of the model against the saturated model. It is the test of the model against the null model, which is quite a different thing (with a different null hypothesis, etc.).

The test of the fitted model against a model with only an intercept is the test of the model as a whole. This test is based on the difference between the model's deviance and the null deviance, with the degrees of freedom equal to the difference between the model's residual degrees of freedom and the null model's residual degrees of freedom (see my answer here: Test GLM model using null and model deviances). It is a test of whether the model contains any information about the response anywhere. In general, when there is only one variable in the model, this test would be equivalent to the test of the included variable. (For a GLM, there is an added complication that the types of tests used can differ, and thus yield slightly different p-values; see my answer here: Why do my p-values differ between logistic regression output, chi-squared test, and the confidence interval for the OR?)

The goodness of fit / lack of fit test for a fitted model is the test of the model against a model that has one fitted parameter for every data point (and thus always fits the data perfectly). It is based on the difference between the saturated model's deviance and the model's residual deviance, with the degrees of freedom equal to the difference between the saturated model's residual degrees of freedom and the model's residual degrees of freedom. For logistic regression models, the saturated model will always have $0$ residual deviance and $0$ residual degrees of freedom (see here). Thus, you could skip fitting such a model and just test the model's residual deviance using the model's residual degrees of freedom.

To answer this thread's explicit question: The null hypothesis of the lack of fit test is that the fitted model fits the data as well as the saturated model. That is, there is no remaining information in the data, just noise. While we usually want to reject the null hypothesis, in this case, we want to fail to reject the null hypothesis.

To explore these ideas, let's use the data from my answer to How to use boxplots to find the point where values are more likely to come from different conditions?

Cond.1 = c(2.9, 3.0, 3.1, 3.1, 3.1, 3.3, 3.3, 3.4, 3.4, 3.4, 3.5, 3.5, 3.6, 3.7, 3.7,
           3.8, 3.8, 3.8, 3.8, 3.9, 4.0, 4.0, 4.1, 4.1, 4.2, 4.4, 4.5, 4.5, 4.5, 4.6,
           4.6, 4.6, 4.7, 4.8, 4.9, 4.9, 5.5, 5.5, 5.7)
Cond.2 = c(2.3, 2.4, 2.6, 3.1, 3.7, 3.7, 3.8, 4.0, 4.2, 4.8, 4.9, 5.5, 5.5, 5.5, 5.7,
           5.8, 5.9, 5.9, 6.0, 6.0, 6.1, 6.1, 6.3, 6.5, 6.7, 6.8, 6.9, 7.1, 7.1, 7.1,
           7.2, 7.2, 7.4, 7.5, 7.6, 7.6, 10, 10.1, 12.5)
dat    = stack(list(cond1=Cond.1, cond2=Cond.2))

For convenience, I will define two functions to conduct these two tests:

lr.null.mod.test = function(model, digits=3){
  n.dev = model$null.deviance
  m.dev = model$deviance
  n.df  = model$df.null
  r.df  = model$df.residual
  ch.sq = n.dev - m.dev
  df    = n.df - r.df
  p     = 1-pchisq(ch.sq, df)
  out   = c(chi.sq=formatC(ch.sq, digits=digits-2, format="f"), 
            df=formatC(df, digits=0, format="d"), 
            p.value=formatC(p, digits=digits, format="f"))
  return(noquote(out))
}
lr.sat.mod.test = function(model, digits=3){
  m.dev = model$deviance
  n.df  = model$df.null
  r.df  = model$df.residual
  ch.sq = m.dev - 0
  df    = length(model$y) - length(model$coefficients)
  p     = 1-pchisq(ch.sq, df)
  out   = c(chi.sq=formatC(ch.sq, digits=digits-2, format="f"), 
            df=formatC(df, digits=0, format="d"), 
            p.value=formatC(p, digits=digits, format="f"))
  return(noquote(out))
}

Let's fit several models: 1) a null model with only an intercept; 2) our primary model using x; 3) a saturated model with a unique variable for every datapoint; and 4) a model also including a squared function of x.

m.null = glm(ind~1,                  dat, family="binomial")
m.x    = glm(ind~values,             dat, family="binomial")
m.sat  = glm(ind~as.factor(1:78),    dat, family="binomial")
m.x.x2 = glm(ind~values+I(values^2), dat, family="binomial")

Now let's look at some abridged output for these models.

summary(m.null)
# ...
#     Null deviance: 108.13  on 77  degrees of freedom
# Residual deviance: 108.13  on 77  degrees of freedom
# AIC: 110.13
# ...
summary(m.sat)
# ...
# Deviance Residuals: 
#  [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
# [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
# [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
# [76]  0  0  0
# 
# Coefficients:
#                     Estimate Std. Error z value Pr(>|z|)
# (Intercept)       -2.657e+01  3.561e+05       0        1
# as.factor(1:78)2  -1.762e-14  5.036e+05       0        1
# ...
# as.factor(1:78)78  5.313e+01  5.036e+05       0        1
# 
# ...
#     Null deviance: 1.0813e+02  on 77  degrees of freedom
# Residual deviance: 4.5252e-10  on  0  degrees of freedom
# AIC: 156
# 
# Number of Fisher Scoring iterations: 25
summary(m.x)
# ...
# Deviance Residuals: 
#     Min       1Q   Median       3Q      Max  
# -1.5683  -0.7985  -0.2407   0.7207   2.3069  
# 
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept)  -4.9375     1.1616  -4.251 2.13e-05 ***
# values        1.0213     0.2412   4.235 2.28e-05 ***
# ...
#     Null deviance: 108.131  on 77  degrees of freedom
# Residual deviance:  77.576  on 76  degrees of freedom
# AIC: 81.576
# ...

We see that the fitted model's reported null deviance equals the reported deviance from the null model, and that the saturated model's residual deviance is $0$ (up to rounding error arising from the fact that computers cannot carry out infinite precision arithmetic). Let's conduct our tests as defined above, and nested model tests of the actual models.

lr.null.mod.test(m.x)
#  chi.sq      df p.value 
#    30.6       1   0.000 
anova(m.null, m.x, test="LRT")
# Analysis of Deviance Table
# 
# Model 1: ind ~ 1
# Model 2: ind ~ values
#   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
# 1        77    108.131                          
# 2        76     77.576  1   30.555 3.246e-08 ***
lr.sat.mod.test(m.x)
#  chi.sq      df p.value 
#    77.6      76   0.428 
anova(m.x, m.sat, test="LRT")
# Analysis of Deviance Table
# 
# Model 1: ind ~ values
# Model 2: ind ~ as.factor(1:78)
#   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
# 1        76     77.576                     
# 2         0      0.000 76   77.576   0.4282

We can see that the results are the same. We also see that the lack of fit test was not significant. Many people will interpret this as showing that the fitted model is correct and has extracted all the information in the data. In fact, this is a dicey assumption, and is a problem with such tests. It amounts to assuming that the null hypothesis has been confirmed. As discussed in my answer to: Why do statisticians say a non-significant result means “you can't reject the null” as opposed to accepting the null hypothesis?, this assumption is invalid. We can see the problem, if we explore the last model fitted, and conduct its lack of fit test as well. It fits better than our initial model, despite our initial model 'passed' its lack of fit test. (In fact, one could almost argue that this model fits 'too well'; see here.)

summary(m.x.x2)
# ...
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept)  18.0825     6.1130   2.958 0.003096 ** 
# values       -9.8912     2.9649  -3.336 0.000850 ***
# I(values^2)   1.2220     0.3449   3.543 0.000396 ***
# ...
# 
#     Null deviance: 108.131  on 77  degrees of freedom
# Residual deviance:  56.884  on 75  degrees of freedom
# AIC: 62.884
# ...
anova(m.x, m.x.x2, test="LRT")
# Analysis of Deviance Table
# 
# Model 1: ind ~ values
# Model 2: ind ~ values + I(values^2)
#   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
# 1        76     77.576                          
# 2        75     56.884  1   20.692 5.394e-06 ***
lr.sat.mod.test(m.x.x2)
#  chi.sq      df p.value 
#    56.9      75   0.941

Best Answer

Related Solutions

Solved – null hypothesis in the deviance goodness of fit test for a GLM model

Related Question