Solved – How many groups to use in Hosmer and Lemeshow test

goodness of fitnonlinear regressionrregression

in the programs I know (i.e. R statistics) the default numbers of groups to use in Hosmer and Lemeshow test (goodness-of-fit for models, especially logistic regression) is set to 10. I wonder why or if I should define another number for my data. Because I also found that the number of groups is really important for the p-value which results from this.

Does it depend on the amount of data, the degrees of freedom, or something else?

Best Answer

The Hosmer and Lemeshow test is really sensitive on the group number and sometimes the change of group will lead to a disaster of conclusion. My suggestion is still use other test for comparison. And be careful with HL-test.

The issue of this is presented in this paper: https://support.sas.com/resources/papers/proceedings14/1485-2014.pdf

And another published paper proposed alternatives and adjustment of HL-test, and also gave recommendations for different situations. Here: http://onlinelibrary.wiley.com/doi/10.1002/sim.5525/full

Related Solutions

Solved – Degrees of freedom of $\chi^2$ in Hosmer-Lemeshow test

Hosmer D.W., Lemeshow S. (1980), A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics, A10, 1043-1069 show that:

If the model is a logistic regression model and the $p$ parameters are estimated by maximum likelihood and the $G$ groups are defined on the estimated probabilities then it holds that $X^2$ is asymptotically $\chi^2(G-p-1)+\sum_{i=1}^{p+1} \lambda_i \chi_i^2(1)$ (Hosmer,Lemeshow, 1980, p.1052, Theorem 2).

(Note: the necessary conditions are not explicitly in Theorem 2 on page 1052 but if one attentively reads the paper and the proof then these pop up)

The second term $\sum_{i=1}^{p+1} \lambda_i \chi_i^2(1)$ results from the fact that the grouping is based on estimated - i.e. random - quantities (Hosmer,Lemeshow, 1980, p.1051)

Using simulations they showed that the second term can be (in the cases used in the simualtion) approximated by a $\chi^2(p-1)$ (Hosmer,Lemeshow, 1980, p.1060)

Combining these two facts results in a sum of two $\chi^2$ variables, one with $G-p-1$ degrees of freedom and a second one with $p-1$ degrees of freedom or $X^2 \sim \chi^2(G-p-1+p-1=G-2)$

So the answer to the question lies in the occurrence of the 'weighted chi-square term' or in the fact that the groups are defined using estimated probabilities that are themselves random variables.

Solved – Hosmer-Lemeshow test in R

Update 28 July: the below has now been pushed to CRAN in the generalhoslem package along with the Lipsitz and Pulkstenis-Robinson tests.

Fagerland and Hosmer discuss a generalisation of the Hosmer-Lemeshow test and two other approaches (the Lipsitz test and Pulkstenis-Robinson tests) in A goodness-of-fit test for the proportional odds regression model 2013 Stat Med and in Tests for goodness of fit in ordinal logistic regression models (2016) Journal of Statistical Computing and Simulation.

I haven't checked properly yet, but as far as I know, they haven't been implemented in R. If that turns out to be the case, I plan to add them to the generalhoslem package.

EDIT: it's probably also worth pointing out Fagerland and Hosmer's recommendation that

because the tests may detect different types of lack of fit, a thorough assessment of goodness of fit requires use of all three approaches.

ANOTHER EDIT: I didn't realise this straight away but the only difference between the multinomial version of the Hosmer-Lemeshow test and the ordinal version is in the degrees of freedom. Strictly speaking, the test for ordinal response models requires sorting the observations by a weighted ordinal 'response score' but they show in the 2012 article above that that is equivalent to binning the observations in the same way as in the multinomial case.

I haven't properly tested it but the the logitgof function on my github page should now do the trick: https://github.com/matthewjay15/generalhoslem-v1.1.0/blob/1.2.0.9000/logitgof.R Note that if you examine the observed and expected tables produced by this function, then the columns may not be in the right order (they will be alphabetical). This shouldn't make any difference to the test statistic, though, as they will correspond to each other.

An example using the ordinal and MASS packages:

library(reshape) # needed by logitgof
logitgof <- function (obs, exp, g = 10, ord = FALSE) {
  DNAME <- paste(deparse(substitute(obs)), deparse(substitute(exp)), sep = ", ")
  yhat <- exp
  if (is.null(ncol(yhat))) {
    mult <- FALSE
  } else {
    if (ncol(yhat) == 1) {
      mult <- FALSE
    } else mult <- TRUE
  }
  n <- ncol(yhat)
  if (mult) {
    if (!ord) {
      METHOD <- "Hosmer and Lemeshow test (multinomial model)"
    } else {
      METHOD <- "Hosmer and Lemeshow test (ordinal model)"
    }
    qq <- unique(quantile(1 - yhat[, 1], probs = seq(0, 1, 1/g)))
    cutyhats <- cut(1 - yhat[, 1], breaks = qq, include.lowest = TRUE)
    dfobs <- data.frame(obs, cutyhats)
    dfobsmelt <- melt(dfobs, id.vars = 2)
    observed <- cast(dfobsmelt, cutyhats ~ value, length)
    observed <- observed[order(c(1, names(observed[, 2:ncol(observed)])))]
    dfexp <- data.frame(yhat, cutyhats)
    dfexpmelt <- melt(dfexp, id.vars = ncol(dfexp))
    expected <- cast(dfexpmelt, cutyhats ~ variable, sum)
    expected <- expected[order(c(1, names(expected[, 2:ncol(expected)])))]
    stddiffs <- abs(observed[, 2:ncol(observed)] - expected[, 2:ncol(expected)]) / sqrt(expected[, 2:ncol(expected)])
    if (ncol(expected) != ncol(observed)) stop("Observed and expected tables have different number of columns. Check you entered the correct data.")
    chisq <- sum((observed[, 2:ncol(observed)] - expected[, 2:ncol(expected)])^2 / expected[, 2:ncol(expected)])
    if (!ord) {
      PARAMETER <- (nrow(expected) - 2) * (ncol(yhat) - 1) 
    } else {
      PARAMETER <- (nrow(expected) - 2) * (ncol(yhat) - 1) + ncol(yhat) - 2
    }
  } else {
    METHOD <- "Hosmer and Lemeshow test (binary model)"
    if (is.factor(obs)) {
      y <- as.numeric(obs) - 1
    } else {
      y <- obs
    }
    qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g)))
    cutyhat <- cut(yhat, breaks = qq, include.lowest = TRUE)
    observed <- xtabs(cbind(y0 = 1 - y, y1 = y) ~ cutyhat)
    expected <- xtabs(cbind(yhat0 = 1 - yhat, yhat1 = yhat) ~ cutyhat)
    stddiffs <- abs(observed - expected) / sqrt(expected)
    chisq <- sum((observed - expected)^2/expected)
    PARAMETER <- nrow(expected) - 2
  }
  if (g != nrow(expected))
    warning(paste("Not possible to compute", g, "rows. There might be too few observations."))
  if (any(expected[, 2:ncol(expected)] < 1))
    warning("At least one cell in the expected frequencies table is < 1. Chi-square approximation may be incorrect.")
  PVAL <- 1 - pchisq(chisq, PARAMETER)
  names(chisq) <- "X-squared"
  names(PARAMETER) <- "df"
  structure(list(statistic = chisq, parameter = PARAMETER, 
                 p.value = PVAL, method = METHOD, data.name = DNAME, observed = observed, 
                 expected = expected, stddiffs = stddiffs), class = "htest")
}

library(foreign) # just to download the example dataset

# with the ordinal package
library(ordinal)
ml <- read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")
mod1 <- clm(ses ~ female + write + read, data = ml)

# extract predicted probs for each level of outcome
predprob <- data.frame(id = ml$id, female = ml$female, read = ml$read, write = ml$write)
fv <- predict(mod1, newdata = predprob, type = "prob")$fit
logitgof(ml$ses, fv, ord = TRUE) # set ord to TRUE to run ordinal instead of multinomial test

# with MASS
library(MASS)
mod2 <- polr(ses ~ female + write + read, data = ml)
logitgof(ml$ses, fitted(mod2), ord = TRUE)

Best Answer

Related Solutions

Solved – Degrees of freedom of $\chi^2$ in Hosmer-Lemeshow test

Solved – Hosmer-Lemeshow test in R

Related Question