Solved – Using residualized predictors outside the linear model context

generalized linear modellogisticpoisson distributionrregression

Can anyone point me towards a good explanation of when a residualized variable in a regression will give you the same answer as using a non-residualized variable with controls?

For instance, say I want to know the effect of a variable $x$ on $y$ and need to control for $a$ and $b$. In a classic linear model framework I can either add $a$ and $b$ as covariates (i.e., control variables) to the model of $y$ on $x$, or I can first regress $x$ on $a$ and $b$, and then use the residuals from this regression (the residualized $x$) to predict $y$. Both will give the same coefficient for $x$.

This works in the linear model case, but does a residualized $x$ give the same coefficient as $x$ with controls for other types of models, e.g., logit models or Poisson models? My own simple simulations suggest they do not (see R code below), but I am trying to understand why, and if residualization can ever be used in place of adding controls outside of the linear model framework. Can anyone point me towards a good explanation?

#generate the data
n=10000
set.seed(3345)
a=rnorm(n); b=rnorm(n)
x = .4*a + .4*b*b + rnorm(n)
y = .5*x + .3*a + .3*b*b + rnorm(n)

## LINEAR MODEL ####
#a model with controls gets the right coefficient
summary(lm(y ~ x + a + I(b^2)))
residmod=lm(x ~ a + I(b^2))
x.resid=resid(residmod)
#using a residualized variable gets the same coefficient
summary(lm(y ~ x.resid))

## LOGIT MODEL ####
y=.5*x + .3*a + .3*b*b + rlogis(n)
ydichot=ifelse(y >0, 1, 0)
#a model with controls gets the right coefficient
summary(glm(ydichot ~ x + a + I(b^2), family=binomial))
#using a residualized variable does NOT get the same coefficient
summary(glm(ydichot ~ x.resid, family=binomial))

## POISSON MODEL ####
mu=exp(.5*x + .3*a + .3*b*b)
ycount=rpois(n, mu)
summary(glm(ycount ~ x + a + I(b^2), family=poisson))
#using a residualized variable does NOT get the same coefficient
summary(glm(ycount ~ x.resid, family=poisson))

Best Answer

Residualization can be used outside the linear framework. For a direct use of residualization on nonlinear probability models, you could consult this paper: http://smx.sagepub.com/content/42/1/286. It explains what residualization does in nonlinear probability models.

Residualization is also used in nonparametric regression.

Related Question