Multiple Regression – Why ANOVA/Regression Results Change When Controlling for Another Variable

multiple regression

This question might be very basic, but somehow I don't understand this point.

Suppose initially I used a univariate regression equation such as

GDP=a+b*Income 

I'll get some coefficient values (say 0.5). Now, I'm using the same structure of the regression model, but added another independent variable. So, the new equation is

GDP=a+b*Income+c*Investment

Then the new coefficients value will be b=0.3 & c=0.4.

My question is why coefficient's value changes when we add another independent variable?

Hope I can put my question clearly.

Best Answer

Linear regression can be illustrated geometrically in terms of an orthogonal projection of the predicted variable vector $\boldsymbol{y}$ onto the space defined by the predictor vectors $\boldsymbol{x}_{i}$. This approach is nicely explained in Wicken's book "The Geometry of Multivariate Statistics" (1994). Without loss of generality, assume centered variables. In the following diagrams, the length of a vector equals its standard deviation, and the cosine of the angle between two vectors equals their correlation (see here). The simple linear regression from $\boldsymbol{y}$ onto $\boldsymbol{x}$ then looks like this:

enter image description here

$\hat{\boldsymbol{y}} = b \cdot \boldsymbol{x}$ is the prediction that results from the orthogonal projection of $\boldsymbol{y}$ onto the subspace defined by $\boldsymbol{x}$. $b$ is the projection of $\boldsymbol{y}$ in subspace coordinates (basis vector $\boldsymbol{x}$). This prediction minimizes the error $\boldsymbol{e} = \boldsymbol{y} - \hat{\boldsymbol{y}}$, i.e., it finds the closest point to $\boldsymbol{y}$ in the subspace defined by $\boldsymbol{x}$ (recall that minimizing the error sum of squares means minimizing the variance of the error, i.e., its squared length). With two correlated predictors $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$, the situation looks like this:

enter image description here

$\boldsymbol{y}$ is projected orthogonally onto $U$, the subspace (plane) spanned by $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$. The prediction $\hat{\boldsymbol{y}} = b_{1} \cdot \boldsymbol{x}_{1} + b_{2} \cdot \boldsymbol{x}_{2}$ is this projection. $b_{1}$ and $b_{2}$ are thus the ends of the dotted lines, i.e. the coordinates of $\hat{\boldsymbol{y}}$ in subspace coordinates (basis vectors $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$).

The next thing to realize is that the orthogonal projections of $\hat{\boldsymbol{y}}$ onto $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ are the same as the orthogonal projections of $\boldsymbol{y}$ itself onto $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$.

enter image description here

This allows us to directly compare the regression weights from each simple regression with the regression weights from the multiple regression:

enter image description here

$\hat{\boldsymbol{y}}_{1}$ and $\hat{\boldsymbol{y}}_{2}$ are the predictions from the simple regressions $\boldsymbol{y}$ onto $\boldsymbol{x}_{1}$, and $\boldsymbol{y}$ onto $\boldsymbol{x}_{2}$. Their endpoints give the individual regression weights $b^{1} = \rho_{x_{1} y} \cdot \sigma_{y}$ and $b^{2} = \rho_{x_{2} y} \cdot \sigma_{y}$, where $\rho_{x_{1} y}$ is the correlation between $\boldsymbol{x}_{1}$ and $\boldsymbol{y}$, and $\sigma_{y}$ is the standard deviation of $\boldsymbol{y}$. In contrast, the endpoints of the dotted lines give the regression weights from the multiple regression of $\boldsymbol{y}$ onto $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$: $b_{1} = \beta_{1} \sigma_{y}$, where $\beta_{1}$ is the standardized regression coefficient.

Now it is easy to see that $b^{1}$ and $b^{2}$ will coincide exactly with $b_{1}$ and $b_{2}$ only if $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ are orthogonal (or if $\boldsymbol{y}$ is orthogonal to the plane spanned by $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$). It is also easy to geometrically construct cases that sometimes seem puzzling, e.g., when the regression weight has the opposite sign as the bivariate correlation between a predictor and the predicted variable:

enter image description here

Here, $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ are highly correlated. Now the sign of the correlation between $\boldsymbol{y}$ and $\boldsymbol{x}_{1}$ is positive (red line: orthogonal projection of $\boldsymbol{y}$ onto $\boldsymbol{x}_{1}$), but the regression weight from the multiple regression is negative (end of green line onto subspace defined by $\boldsymbol{x}_{1}$.

Related Question