Logistic Regression – Regression of Logistic Regression Residuals on Regressors

logisticregressionresiduals

With OLS regression applied to continuous response, one can build up the multiple regression equation by sequentially running regressions of the residuals on each covariate. My question is, is there a way to do this with logistic regression via logistic regression residuals?

That is, if I want to estimate $\Pr(Y = 1 | x, z)$ using the standard generalized linear modeling approach, is there a way to run logistic regression against $x$ and get pseudo-residuals $R_1$, then regress $R_1$ on $z$ to get an unbiased estimator of the logistic regression coefficients. References to textbooks or literature would be appreciated.

Best Answer

In standard multiple linear regression, the ability to fit ordinary-least-squares (OLS) estimates in two-steps comes from the Frisch–Waugh–Lovell theorem. This theorem shows that the estimate of a coefficient for a particular predictor in a multiple linear model is equal to the estimate obtained by regressing the response residuals (residuals from a regression of the response variable against the other explanatory variables) against the predictor residuals (residuals from a regression of the predictor variable against the other explanatory variables). Evidently, you are seeking an analogy to this theorem that can be used in a logistic regression model.

For this question, it is helpful to recall the latent-variable characterisation of logistic regression:

$$Y_i = \mathbb{I}(Y_i^* > 0) \quad \quad \quad Y_i^* = \beta_0 + \beta_X x_i + \beta_Z z_i + \varepsilon_i \quad \quad \quad \varepsilon_i \sim \text{IID Logistic}(0,1).$$

In this characterisation of the model, the latent response variable $Y_i^*$ is unobservable, and instead we observe the indicator $Y_i$ which tells us whether or not the latent response is positive. This form of the model looks similar to multiple linear regression, except that we use a slightly different error distribution (the logistic distribution instead of the normal distribution), and more importantly, we only observe an indicator showing whether or not the latent response is positive.

This creates an issue for any attempt to create a two-step fit of the model. This Frisch-Waugh-Lovell theorem hinges on the ability to obtain intermediate residuals for the response and predictor of interest, taken against the other explanatory variables. In the present case, we can only obtain residuals from a "categorised" response variable. Creating a two-step fitting process for logistic regression would require you to use response residuals from this categorised response variable, without access to the underlying latent response. This seems to me like a major hurdle, and while it does not prove impossibility, it seems unlikely to be possible to fit the model in two steps.

Below I will give you an account of what would be required to find a two-step process to fit a logistic regression. I am not sure if there is a solution to this problem, or if there is a proof of impossibility, but the material here should get you some way towards understanding what is required.


What would a two-step logistic regression fit look like? Suppose we want to construct a two-step fit for a logistic regression model where the parameters are estimated via maximum-likelihood estimation at each step. We want the process to involve an intermediate step that fits the following two models:

$$\begin{matrix} Y_i = \mathbb{I}(Y_i^{**} > 0) & & & Y_i^{**} = \alpha_0 + \alpha_X x_i + \tau_i & & & \tau_i \sim \text{IID Logistic}(0,1), \\[6pt] & & & \text{ } \text{ } Z_i = \gamma_0 + \gamma_X x_i + \delta_i & & & \delta_i \sim \text{IID } g. \quad \quad \quad \quad \quad \\ \end{matrix}$$

We estimate the coefficients of these models (via MLEs) and this yields intermediate fitted values $\hat{\alpha}_0, \hat{\alpha}_X, \hat{\gamma}_0, \hat{\gamma}_X$. Then in the second step we fit the model:

$$Y_i = \text{logistic}(\hat{\alpha}_0 + \hat{\alpha}_1 x_i) + \beta_Z (z_i - \hat{\gamma}_0 - \hat{\gamma}_X x_i) + \epsilon_i \quad \quad \quad \epsilon_i \sim \text{IID } f.$$

As specified, the procedure has a lot of fixed elements, but the density functions $g$ and $f$ in these steps are left unspecified (though they should be zero-mean distributions that do not depend on the data). To obtain a two-step fitting method under these constraints we need to choose $g$ and $f$ to ensure that the MLE for $\beta_Z$ in this two-step model-fit algorithm is the same as the MLE obtained from the one-step logistic regression model above.

To see if this is possible, we first write all the estimated parameters from the first step:

$$\begin{equation} \begin{aligned} \ell_{\mathbf{y}| \mathbf{x}} (\hat{\alpha}_0, \hat{\alpha}_X) &= \underset{\alpha_0, \alpha_X}{\max} \sum_{i=1}^n \ln \text{Bern}(y_i | \text{logistic}(\alpha_0 + \alpha_X x_i)), \\[10pt] \ell_{\mathbf{z}| \mathbf{x}} (\hat{\gamma}_0, \hat{\gamma}_X) &= \underset{\gamma_0, \gamma_X}{\max} \sum_{i=1}^n \ln g( z_i - \gamma_0 - \gamma_X x_i ). \end{aligned} \end{equation}$$

Let $\epsilon_i = y_i - \text{logistic}(\hat{\alpha}_0 - \hat{\alpha}_1 x_i) + \beta_Z (z_i - \hat{\gamma}_0 - \hat{\gamma}_X x_i)$ so that the log-likelihood function for the second step is:

$$\ell_{\mathbf{y}|\mathbf{z}|\mathbf{x}}(\beta_Z) = \sum_{i=1}^n \ln f(y_i - \text{logistic}(\hat{\alpha}_0 - \hat{\alpha}_1 x_i) + \beta_Z (z_i - \hat{\gamma}_0 - \hat{\gamma}_X x_i)).$$

We require that the maximising value of this function is the MLE of the multiple logistic regression model. In other words, we require:

$$\underset{\beta_X}{\text{arg max }} \ell_{\mathbf{y}|\mathbf{z}|\mathbf{x}}(\beta_Z) = \underset{\beta_X}{\text{arg max }} \underset{\beta_0, \beta_Z}{\max} \sum_{i=1}^n \ln \text{Bern}(y_i | \text{logistic}(\beta_0 + \beta_X x_i + \beta_Z z_i)).$$

I leave it to others to determine if there is a solution to this problem, or a proof of no solution. I suspect that the "categorisation" of the latent response variable in a logistic regression will make it impossible to find a two-step process.