Solved – How to compute Pearson’s $\chi^2$ test statistic for lack of fit on a logistic regression model in R

chi-squared-testgeneralized linear modelgoodness of fitlogisticr

The likelihood ratio (a.k.a. deviance) $G^2$ statistic and lack-of-fit (or goodness-of-fit) test is fairly straightforward to obtain for a logistic regression model (fit using the glm(..., family = binomial) function) in R. However, it can be easy to have some cell counts end up low enough that the test is unreliable. One way to verify the reliability of the likelihood ratio test for lack of fit is to compare its test statistic and P-value to those of Pearson's chi square (or $\chi^2$) lack-of-fit test.

Neither the glm object nor its summary() method report the test statistic for Pearson's chi square test for lack of fit. In my search, the only thing I came up with is the chisq.test() function (in the stats package): its documentation says "chisq.test performs chi-squared contingency table tests and goodness-of-fit tests." However, the documentation is sparse on how to perform such tests:

If x is a matrix with one row or column, or if x is a vector and y is not given, then a goodness-of-fit test is performed (x is treated as a one-dimensional contingency table). The entries of x must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in p, or are all equal if p is not given.

I'd imagine that you could use the y component of the glm object for the x argument of chisq.test. However, you can't use the fitted.values component of the glm object for the p argument of chisq.test, because you'll get an error: "probabilities must sum to 1."

How can I (in R) at least calculate the Pearson $\chi^2$ test statistic for lack of fit without having to run through the steps manually?

Best Answer

The sum of the squared Pearson residuals is exactly equal to the Pearson $\chi^2$ test statistic for lack of fit. So if your fitted model (i.e., the glm object) is called logistic.fit, the following code would return the test statistic:

sum(residuals(logistic.fit, type = "pearson")^2)

See the documentation on residuals.glm for more information, including what other residuals are available. For example, the code

sum(residuals(logistic.fit, type = "deviance")^2)

will get you the $G^2$ test statistic, just the same as deviance(logistic.fit) provides.