Regression – Regressing a Variable and Then Regressing Residuals on Another Variable

regressionregression coefficientsself-study

Consider observations on three variables $X1, X2,$ and $X3$. Suppose that $X1$ is regressed on $X2$. When the residual of the above regression is regressed on $X3$, the regression coefficient of $X3$ is $β_3$. When $X1$ is regressed on $X2$ and $X3$ simultaneously, the regression coefficient of $X3$ is $β_3^*$.

We need to show that $|β_3| ≤ |β_3^*|$ and determine when equality holds.

A huge part of my confusion here is that I'm unsure how to perform the regression of the residuals on $X3$. If I could have any pointers on how to go about that, I'd be very grateful.

Best Answer

Most multiple regression problems can be reduced to simple regressions involving one response variable and one explanatory variable (without an intercept), amounting to a plane geometry problem. The solution strategy presented here amounts to finding a particularly nice way to represent these variables.

The three variables in this question generate a Euclidean space of at most three dimensions. We will solve the problem directly by selecting a simple but fully general way to express all the variables: that is, by choosing a suitable orthonormal basis for this space. One basic computational fact, the Normal equations (for the simplest possible regression), is that

the coefficient of the regression of one variable $Y$ against another nonzero variable $X$ is $$\beta_{Y;X} = \frac{X\cdot Y}{X\cdot X}$$

where $\cdot$ is the Euclidean inner product.

Choosing units in which the length of $X_2$ (presumably nonzero) is $1,$ begin creating an orthonormal basis of which the first element is $X_2,$ so that in this basis $X_2 = (1,0,0).$

Because $\{X_2, X_3\}$ generate a subspace of at most two dimensions, we may select a second basis element to represent the second dimension, so that $X_3 = (u,v,0),$ say.

The regression of $X_1$ on $X_2$ and $X_3$ states

$$X_1 = \beta_2^* X_2 + \beta_3^* X_3 + \text{residual} = (\beta_2^* + \beta_3^*u,\ \beta_3^*v,\ w)$$

where the third basis element is chosen to be parallel to the residual $(0,0,w).$

Consider the first two regressions. Regressing $X_1$ on $X_2=(1,0,0)$ simply picks out the first coefficient of $X_1,$ leaving the residual

$$E = (0, \beta_3^*v, w).$$

Finally we come to the only step that requires any computation: regressing $E$ against $X_3$ gives the coefficient

$$\beta_3 = \frac{E \cdot X_3}{X_3\cdot X_3} = \frac{(0,\beta_3^*v, w)\cdot(u,v,0)}{(u,v,0)\cdot(u,v,0)} = \beta_3^* \frac{v^2}{u^2+v^2}.$$

(We must assume $u^2+v^2\ne 0,$ which means $X_2$ is nonzero. Otherwise, $X_2$ plays no role and trivially $\beta_3 = \beta_3^*.$)

Because both $u^2$ and $v^2$ are nonnegative, taking absolute values produces

$$|\beta_3| = |\beta_3^*|\frac{v^2}{u^2 + v^2} \le |\beta_3^*|$$ with equality if and only if $u^2 = 0.$

In terms of the original variables, $u^2=0$ means $X_3 = (0,v,0)$ is orthogonal to $X_2.$

Related Question