Logistic Regression – Why Deviance is Not Equal to -2*logLik in R

deviancelikelihoodlogisticr

Just tried to compute McFadden's $R^2$ from hand in R from a fitted logistic regression, but stumbled accross the problem that the reported deviance is not equal to -2 times the reported log-Liklihood:

library(MASS)
fit <- glm(cbind(Menarche, Total - Menarche) ~ Age, binomial, data=menarche)
fit$deviance
[1] 26.70345
-2*logLik(fit)
'log Lik.' 110.7553 (df=2)

AFAIK, the deviance is defined as $-2(\ell(m) – \ell(m_s))$, where $m$ is the model at hand and $m_s$ is the saturated model with perfect predictions. Perfect predictions in a binary logistic regression lead to a Likelihood of one, and thus a log-Likelihood of zero.

I would thus have thought that, for logistic regression, $D = -2\ell$. Am I missing something?

Best Answer

Sorry for answering my own question, but eventually I found my erronous assumption.

These are grouped data, or, in other words: the same predictor values occur more than once. In this case, even a perfect ("saturated") model cannot predict the response correctly, and the probabilities for the outcome are different from one, thereby resulting in a "saturated deviance" different from zero.

The "saturated model" is therefore the model with $P(Y=1|X=x)=k_i/n_i$, where $n_i$ is the number of samples with $X=x$ and $k_i$ is the number of $Y=1$ among these. The log-likelihood function of the saturated model is thus $$\ell_s = \sum_{i=1}^n \log\left[{n_i \choose k_i} p_i^{k_i}(1-p_i)^{n_i-k_i} \right] \quad\mbox{with}\quad p_i=\frac{k_i}{n_i}$$

Using this expression for computing the "saturated deviance" yields the result reported by glm:

> library(MASS)
> fit <- glm(cbind(Menarche, Total - Menarche) ~ Age, binomial, data=menarche)
> n <- menarche$Total
> k <- menarche$Menarche
> LL.s <- 0
> for (i in 1:length(n)) {
+  LL.s <- LL.s + log(dbinom(k[i], n[i], k[i]/n[i]))
+ }
> as.numeric(-2*logLik(fit) + 2*LL.s)
[1] 26.70345
> fit$deviance
[1] 26.70345