Solved – Why is a GLM’s residual deviance minus twice its log likelihood

deviancegeneralized linear modellogisticr

I understand for squared loss, add a $1/2$ to objective function will simplify many derivations, since the derivative of a square has a constant $2$.

Are we doing similar things to logistic loss? If not, why residual deviance is twice the negative log likelihood?

Few lines of code to demo my question.

fit=glm(vs~mpg+hp+wt,mtcars,family = binomial())
p=fit$fitted.values
y=mtcars$vs

# these two values are the same
fit$deviance/2
-sum(y*log(p)+(1-y)*log(1-p))

Best Answer

It's not just with the logistic; it's true of the deviance more generally in GLMs.

Indeed the idea of taking twice the log of a likelihood ratio arises because of Wilks' theorem relating to likelihood ratio tests, which tells us that $-2\log(\Lambda)$ for a pair of nested models has (asymptotically) a chi-square distribution with df equal to the difference in dimensionality.

In the case of GLMs the deviance is formed by comparing with a fully saturated model, where there are as many parameters as observations.

Sometimes simply $-2\log\mathcal{L}$ for a given model is termed "deviance" which is (strictly speaking) a misnomer, but if it is only used to calculate differences between (nested) models, this won't lead to any difficulty (the contribution from the fully saturated model cancels out, so these between-model differences will be the same either way).

Related Question