Logistic Regression – Regression of Logistic Regression Residuals on Regressors

logisticregressionresiduals

With OLS regression applied to continuous response, one can build up the multiple regression equation by sequentially running regressions of the residuals on each covariate. My question is, is there a way to do this with logistic regression via logistic regression residuals?

That is, if I want to estimate $\Pr(Y = 1 | x, z)$ using the standard generalized linear modeling approach, is there a way to run logistic regression against $x$ and get pseudo-residuals $R_1$, then regress $R_1$ on $z$ to get an unbiased estimator of the logistic regression coefficients. References to textbooks or literature would be appreciated.

Best Answer

In standard multiple linear regression, the ability to fit ordinary-least-squares (OLS) estimates in two-steps comes from the Frisch–Waugh–Lovell theorem. This theorem shows that the estimate of a coefficient for a particular predictor in a multiple linear model is equal to the estimate obtained by regressing the response residuals (residuals from a regression of the response variable against the other explanatory variables) against the predictor residuals (residuals from a regression of the predictor variable against the other explanatory variables). Evidently, you are seeking an analogy to this theorem that can be used in a logistic regression model.

For this question, it is helpful to recall the latent-variable characterisation of logistic regression:

$$Y_i = \mathbb{I}(Y_i^* > 0) \quad \quad \quad Y_i^* = \beta_0 + \beta_X x_i + \beta_Z z_i + \varepsilon_i \quad \quad \quad \varepsilon_i \sim \text{IID Logistic}(0,1).$$

In this characterisation of the model, the latent response variable $Y_i^*$ is unobservable, and instead we observe the indicator $Y_i$ which tells us whether or not the latent response is positive. This form of the model looks similar to multiple linear regression, except that we use a slightly different error distribution (the logistic distribution instead of the normal distribution), and more importantly, we only observe an indicator showing whether or not the latent response is positive.

This creates an issue for any attempt to create a two-step fit of the model. This Frisch-Waugh-Lovell theorem hinges on the ability to obtain intermediate residuals for the response and predictor of interest, taken against the other explanatory variables. In the present case, we can only obtain residuals from a "categorised" response variable. Creating a two-step fitting process for logistic regression would require you to use response residuals from this categorised response variable, without access to the underlying latent response. This seems to me like a major hurdle, and while it does not prove impossibility, it seems unlikely to be possible to fit the model in two steps.

Below I will give you an account of what would be required to find a two-step process to fit a logistic regression. I am not sure if there is a solution to this problem, or if there is a proof of impossibility, but the material here should get you some way towards understanding what is required.

What would a two-step logistic regression fit look like? Suppose we want to construct a two-step fit for a logistic regression model where the parameters are estimated via maximum-likelihood estimation at each step. We want the process to involve an intermediate step that fits the following two models:

$$\begin{matrix} Y_i = \mathbb{I}(Y_i^{**} > 0) & & & Y_i^{**} = \alpha_0 + \alpha_X x_i + \tau_i & & & \tau_i \sim \text{IID Logistic}(0,1), \\[6pt] & & & \text{ } \text{ } Z_i = \gamma_0 + \gamma_X x_i + \delta_i & & & \delta_i \sim \text{IID } g. \quad \quad \quad \quad \quad \\ \end{matrix}$$

We estimate the coefficients of these models (via MLEs) and this yields intermediate fitted values $\hat{\alpha}_0, \hat{\alpha}_X, \hat{\gamma}_0, \hat{\gamma}_X$. Then in the second step we fit the model:

$$Y_i = \text{logistic}(\hat{\alpha}_0 + \hat{\alpha}_1 x_i) + \beta_Z (z_i - \hat{\gamma}_0 - \hat{\gamma}_X x_i) + \epsilon_i \quad \quad \quad \epsilon_i \sim \text{IID } f.$$

As specified, the procedure has a lot of fixed elements, but the density functions $g$ and $f$ in these steps are left unspecified (though they should be zero-mean distributions that do not depend on the data). To obtain a two-step fitting method under these constraints we need to choose $g$ and $f$ to ensure that the MLE for $\beta_Z$ in this two-step model-fit algorithm is the same as the MLE obtained from the one-step logistic regression model above.

To see if this is possible, we first write all the estimated parameters from the first step:

$$\begin{equation} \begin{aligned} \ell_{\mathbf{y}| \mathbf{x}} (\hat{\alpha}_0, \hat{\alpha}_X) &= \underset{\alpha_0, \alpha_X}{\max} \sum_{i=1}^n \ln \text{Bern}(y_i | \text{logistic}(\alpha_0 + \alpha_X x_i)), \\[10pt] \ell_{\mathbf{z}| \mathbf{x}} (\hat{\gamma}_0, \hat{\gamma}_X) &= \underset{\gamma_0, \gamma_X}{\max} \sum_{i=1}^n \ln g( z_i - \gamma_0 - \gamma_X x_i ). \end{aligned} \end{equation}$$

Let $\epsilon_i = y_i - \text{logistic}(\hat{\alpha}_0 - \hat{\alpha}_1 x_i) + \beta_Z (z_i - \hat{\gamma}_0 - \hat{\gamma}_X x_i)$ so that the log-likelihood function for the second step is:

$$\ell_{\mathbf{y}|\mathbf{z}|\mathbf{x}}(\beta_Z) = \sum_{i=1}^n \ln f(y_i - \text{logistic}(\hat{\alpha}_0 - \hat{\alpha}_1 x_i) + \beta_Z (z_i - \hat{\gamma}_0 - \hat{\gamma}_X x_i)).$$

We require that the maximising value of this function is the MLE of the multiple logistic regression model. In other words, we require:

$$\underset{\beta_X}{\text{arg max }} \ell_{\mathbf{y}|\mathbf{z}|\mathbf{x}}(\beta_Z) = \underset{\beta_X}{\text{arg max }} \underset{\beta_0, \beta_Z}{\max} \sum_{i=1}^n \ln \text{Bern}(y_i | \text{logistic}(\beta_0 + \beta_X x_i + \beta_Z z_i)).$$

I leave it to others to determine if there is a solution to this problem, or a proof of no solution. I suspect that the "categorisation" of the latent response variable in a logistic regression will make it impossible to find a two-step process.

Related Solutions

Solved – Geometric interpretation of generalized linear model

I think that you best bet is the thesis of Dongwen Luo from Massey University, On the geometry of generalized linear models; it is available online here. In particular you want to focus on Chapt. 3 - The Geometry of GLMs (and more particular in section 3.4). He employs two different "geometrical domains"; one before and one after the canonical link transformation. Some of the basic theoretical machinery stems from Fienberg's work on The Geometry of an r × c Contingency Table. As advocated in Luo's thesis:

For a sample of size $n$, $R^n$ splits into an orthogonal direct sum of the sufficiency space $S$ and the auxiliary space $A$. The MLE of the mean $\hat{\mu}$ lies in the intersection of the sufficiency affine plane $T = s + A$ and the untransformed model space $M_R$. The link transformed mean vector $g(\hat{\mu})$ lies in the transformed mean space $g(M_R)$.

Clearly both $S$ and $A$ need to be at least 2-D and $R^n = S \oplus A$. Under this theoretical framework $\hat{\mu}$ and the data vector $y$ have the same projection onto any direction in the sufficiency space.

Assuming you have differential geometry knowledge, the book of Kass and Vos Geometrical Foundations of Asymptotic Inference should provide a solid foundation on this matter. This paper on The Geometry of Asymptotic Inference is freely available from the author's website.

Finally, to answer your question whether there is "any geometric interpretation of generalized linear model (logistic regression, Poisson, survival)". Yes, there is one; and depends on the link function used. The observations themselves are viewed as a vector in that link transformed space. It goes without saying you will be looking at higher-dimensional manifolds as your sample size and/or the number of columns of your design matrix is increasing.

Multiple Regression – How to Perform Multiple Regression as a Sequence of Univariate Regressions

As per Bill Huber's comments and answer elsewhere, the trick is to remove the influence of the independent variables on each other whenever producing each sequential regression. In other words instead of:

lm(lm(x ~ y1)$residuals ~ y2)

We want:

lm(lm(x ~ y1)$residuals ~ lm(y2 ~ y1)$residuals)

In this case, we DO get back to the multiple regression:

enter image description here

Moreover, we can show the coefficients are the same:

> round(coef(lm(lm(it30 ~ itpc1)$residuals ~ lm(itpc2 ~ itpc1)$residuals)), 5) 
(Intercept) lm(itpc2 ~ itpc1)$residuals  #$
    0.00000                    -0.21846 
> round(coef(lm(lm(it30 ~ itpc2)$residuals ~ lm(itpc1 ~ itpc2)$residuals)), 5) 
(Intercept) lm(itpc1 ~ itpc2)$residuals  #$
    0.00000                     0.29197 
> round(coef(lm(it30 ~ itpc1 + itpc2)), 5)
(Intercept)       itpc1       itpc2 
    0.01186     0.29197    -0.21846

Interestingly, and as expected, if the independent variables are orthogonal as in PCA regression, then we do not need to take out the influence of each of the regressors against each other. In this case it is true that:

lm(lm(x ~ y1)$residuals ~ y2)$residuals

is perfectly correlated with:

lm(x ~ y1 + y2)$residuals

as can be seen here:

enter image description here

This is because the orthogonal principal components have a zero-slope regression line and thus the residuals are equal to the dependent variable (with a vertical translation to mean=0).

enter image description here

Best Answer

Related Solutions

Solved – Geometric interpretation of generalized linear model

Multiple Regression – How to Perform Multiple Regression as a Sequence of Univariate Regressions

Related Question