Solved – the relationship of long and short regression when we have an intercept

econometricsleast squareslinear modelmultiple regressionregression

Consider the linear model estimated by OLS:

$$ y = X\hat{\beta} + \hat{u} = X_1 \hat{\beta}_1 + X_2 \hat{\beta}_2 + \hat{u} $$

We say that the above equation is the long regression,

Consider also the model where we omit the set of variables $X_2$, the short regression:

$$ y = X_1 \tilde{\beta}_1 + \tilde{u} $$

As the coefficients are estimated by OLS, we have this nice result that relates both models:

$$ \tilde{\beta}_1 = (X_1'X_1)^{-1} X_1'y = \hat{\beta}_1 + (X_1'X_1)^{-1} X_1'X_2\hat{\beta}_2, $$
where the residual term vanishes by orthogonality of $\hat{u}$ and the column space of $X$ that contains $X_1$.

This result is often stated in undergraduate econometrics texts (before we have the matrix version of OLS) in the equation:

$$ \tilde{b}_1 = \hat{b}_1 + \delta\cdot\hat{b}_2 $$

where

$$ y = \hat{b}_0 + \hat{b}_1\cdot x_1 + \hat{b}_2\cdot x_2 + \hat{u} $$

and $\delta$ is the coefficient of a regression of $x_2$ on $x_1$ with an intercept.

This should be a straightforward corollary of the matrix version…but the intercepts really confused me.

I can only see that the first result would imply the second if there is no intercept at all (so the matrix $X_2$ is the column vector $x_2$ and the matrix $X_1$ is the column vector $x_1$). How can we exactly relate both results?

Best Answer

As you suspect, the second version is indeed a special case of the more general first result. We obtain it when $X_2=x_2$ and $X_1=(\iota\;\;x_1)$ with $\iota$ a vector of ones for the constant.

What (maybe) confuses you is that Wooldridge's statement only focuses on the coefficient on $x_1$ and does not bother to discuss $\tilde{b}_0$, the coefficient on the constant, as it is often of secondary interest.

When we have a constant, $x_1$ and $x_2$, we get a $(2\times1)$ vector in the short regression $\tilde{b}=(\tilde{b}_0,\tilde{b}_1)'$. Likewise, the regression of $x_2$ on an intercept and $x_1$ then yields a coefficient vector, call it $\Delta$, that contains $\Delta=(\delta_0,\delta)'$.

In Goldberger's general result, $\Delta$ corresponds to $(X_1'X_1)^{-1} X_1'X_2$, the OLSEs of a regression of $X_2$ on $X_1$. (When $X_2$ contains $k_2>1$ variables, we would actually obtain a $(k_1\times k_2)$ matrix of estimated coefficients here, with $k_1$ the number of variables in $X_1$.)

Finally, let $\hat{b}_{[0,1]}=(\hat{b}_0,\hat{b}_1)'$.

So all in all, we may write $$ \tilde{b}=\hat{b}_{[0,1]}+\Delta\cdot\hat{b}_2, $$ which is now, I hope, clearly a special case of Goldberger's formulation. Wooldridge just picks the second element of that vector.