I know that standardized Pearson Residuals are obtained in a traditional probabilistic way:
$$ r_i = \frac{y_i-\hat{\pi}_i}{\sqrt{\hat{\pi}_i(1-\hat{\pi}_i)}}$$
and Deviance Residuals are obtained through a more statistical way (the contribution of each point to the likelihood):
$$ d_i = s_i \sqrt{-2[y_i \log \hat{\pi_i} + (1 – y_i)\log(1-\hat{\pi}_i)]} $$
where $s_i$ = 1 if $y_i$ = 1 and $s_i$ = -1 if $y_i$ = 0.
Can you explain to me, intuitively, how to interpret the formula for deviance residuals?
Moreover, if I want to choose one, which one is more suitable and why?
BTW, some references claim that we derive the deviance residuals based on the term
$$-\frac{1}{2}{r_i}^2$$
where $r_i$ is mentioned above.
Best Answer
Logistic regression seeks to maximize the log likelihood function
$LL = \sum^k \ln(P_i) + \sum^r \ln(1-P_i)$
where $P_i$ is the predicted probability that case i is $\hat Y=1$; $k$ is the number of cases observed as $Y=1$ and $r$ is the number of (the rest) cases observed as $Y=0$.
That expression is equal to
$LL = ({\sum^k d_i^2} + {\sum^r d_i^2})/-2$
because a case's deviance residual is defined as:
$d_i = \begin{cases} \sqrt{-2\ln(P_i)} &\text{if } Y_i=1\\ -\sqrt{-2\ln(1-P_i)} &\text{if } Y_i=0\\ \end{cases}$
Thus, binary logistic regression seeks directly to minimize the sum of squared deviance residuals. It is the deviance residuals which are implied in the ML algorithm of the regression.
The Chi-sq statistic of the model fit is $2(LL_\text{full model} - LL_\text{reduced model})$, where full model contains predictors and reduced model does not.