Multiple Regression – Why ANOVA/Regression Results Change When Controlling for Another Variable

multiple regression

This question might be very basic, but somehow I don't understand this point.

Suppose initially I used a univariate regression equation such as

GDP=a+b*Income

I'll get some coefficient values (say 0.5). Now, I'm using the same structure of the regression model, but added another independent variable. So, the new equation is

GDP=a+b*Income+c*Investment

Then the new coefficients value will be b=0.3 & c=0.4.

My question is why coefficient's value changes when we add another independent variable?

Hope I can put my question clearly.

Best Answer

Linear regression can be illustrated geometrically in terms of an orthogonal projection of the predicted variable vector $\boldsymbol{y}$ onto the space defined by the predictor vectors $\boldsymbol{x}_{i}$. This approach is nicely explained in Wicken's book "The Geometry of Multivariate Statistics" (1994). Without loss of generality, assume centered variables. In the following diagrams, the length of a vector equals its standard deviation, and the cosine of the angle between two vectors equals their correlation (see here). The simple linear regression from $\boldsymbol{y}$ onto $\boldsymbol{x}$ then looks like this:

enter image description here

$\hat{\boldsymbol{y}} = b \cdot \boldsymbol{x}$ is the prediction that results from the orthogonal projection of $\boldsymbol{y}$ onto the subspace defined by $\boldsymbol{x}$. $b$ is the projection of $\boldsymbol{y}$ in subspace coordinates (basis vector $\boldsymbol{x}$). This prediction minimizes the error $\boldsymbol{e} = \boldsymbol{y} - \hat{\boldsymbol{y}}$, i.e., it finds the closest point to $\boldsymbol{y}$ in the subspace defined by $\boldsymbol{x}$ (recall that minimizing the error sum of squares means minimizing the variance of the error, i.e., its squared length). With two correlated predictors $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$, the situation looks like this:

enter image description here

$\boldsymbol{y}$ is projected orthogonally onto $U$, the subspace (plane) spanned by $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$. The prediction $\hat{\boldsymbol{y}} = b_{1} \cdot \boldsymbol{x}_{1} + b_{2} \cdot \boldsymbol{x}_{2}$ is this projection. $b_{1}$ and $b_{2}$ are thus the ends of the dotted lines, i.e. the coordinates of $\hat{\boldsymbol{y}}$ in subspace coordinates (basis vectors $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$).

The next thing to realize is that the orthogonal projections of $\hat{\boldsymbol{y}}$ onto $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ are the same as the orthogonal projections of $\boldsymbol{y}$ itself onto $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$.

enter image description here

This allows us to directly compare the regression weights from each simple regression with the regression weights from the multiple regression:

enter image description here

$\hat{\boldsymbol{y}}_{1}$ and $\hat{\boldsymbol{y}}_{2}$ are the predictions from the simple regressions $\boldsymbol{y}$ onto $\boldsymbol{x}_{1}$, and $\boldsymbol{y}$ onto $\boldsymbol{x}_{2}$. Their endpoints give the individual regression weights $b^{1} = \rho_{x_{1} y} \cdot \sigma_{y}$ and $b^{2} = \rho_{x_{2} y} \cdot \sigma_{y}$, where $\rho_{x_{1} y}$ is the correlation between $\boldsymbol{x}_{1}$ and $\boldsymbol{y}$, and $\sigma_{y}$ is the standard deviation of $\boldsymbol{y}$. In contrast, the endpoints of the dotted lines give the regression weights from the multiple regression of $\boldsymbol{y}$ onto $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$: $b_{1} = \beta_{1} \sigma_{y}$, where $\beta_{1}$ is the standardized regression coefficient.

Now it is easy to see that $b^{1}$ and $b^{2}$ will coincide exactly with $b_{1}$ and $b_{2}$ only if $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ are orthogonal (or if $\boldsymbol{y}$ is orthogonal to the plane spanned by $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$). It is also easy to geometrically construct cases that sometimes seem puzzling, e.g., when the regression weight has the opposite sign as the bivariate correlation between a predictor and the predicted variable:

enter image description here

Here, $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ are highly correlated. Now the sign of the correlation between $\boldsymbol{y}$ and $\boldsymbol{x}_{1}$ is positive (red line: orthogonal projection of $\boldsymbol{y}$ onto $\boldsymbol{x}_{1}$), but the regression weight from the multiple regression is negative (end of green line onto subspace defined by $\boldsymbol{x}_{1}$.

Related Solutions

R-Squared – Relation Between $R^2$ of Simple Regression and Multiple Regression

The second regressor can simply make up for what the first did not manage to explain in the dependent variable. Here is a numerical example:

Generate x1 as a standard normal regressor, sample size 20. Without loss of generality, take $y_i=0.5x_{1i}+u_i$, where $u_i$ is $N(0,1)$, too. Now, take the second regressor x2 as simply the difference between the dependent variable and the first regressor.

n <- 20 
x1 <- rnorm(n)

y <- .5*x1 + rnorm(n)

x2 <- y - x1
summary(lm(y~x1))$r.squared
summary(lm(y~x2))$r.squared
summary(lm(y~x1+x2))$r.squared

Solved – Forecast time series data with external variables

If you fit a model using external variables and want to forecast from this model, you will need (forecasted) future values of the external variables, plain and simple. There is no way around this.

There are of course different ways of forecasting your explanatory variables. You can use the last observed value (the "naive random walk" forecast) or the overall mean. You can simply set them to zero if this is a useful value for them (e.g., special events that happened in the past like an earthquake, which you don't anticipate to recur). Or you could fit and forecast a time series model to these explanatory variables themselves, e.g., using auto.arima.

The alternative is to fit a model to your $y$ values without explanatory variables, by removing the xreg parameter, then to forecast $y$ using this model. One advantage is that this may even capture regularities in your explanatory variables. For instance, your ice cream sales may be driven by temperature, and you don't have good forecasts for temperature a few months ahead... but temperature is seasonal, so simply fitting a model without temperature yields a seasonal model, and your seasonal forecasts may actually be pretty good even if you don't include the actual driver of sales.

I recommend this free online forecasting textbook, especially this section on multiple regression (unfortunately, there is nothing about ARIMAX there), as well as Rob Hyndman's blog post "The ARIMAX model muddle".

Best Answer

Related Solutions

R-Squared – Relation Between $R^2$ of Simple Regression and Multiple Regression

Solved – Forecast time series data with external variables

Related Question