Solved – Regression $R^2$ and correlations

correlationmultiple regressionr-squaredregression

I understand that for simple linear regression, the sample correlation coefficient is the square root of the $R^2$. But that's just for a simple (i.e., single variable) regression $Y=\beta_0+\beta_1X+\varepsilon$.

How about multiple regression, e.g., $Y=\beta_0+\beta_1X_1 + \beta_2X_2+\varepsilon$? Is there any relationship between the correlations $corr(Y, X_1)$, $corr(Y, X_2)$ and the regression $R^2$?

Best Answer

For two predictors, it is easy to write out the equation in algebraic form:

$R^2 = \frac{r^2_{x1,y} + r^2_{x2,y} - 2r_{x1,y}r_{x2,y}r_{x1,x2}}{1-r^2_{x1,x2}}$.

As pointed out by @gung, you also need to know the correlation between $x1$ and $x2$.

EDIT: Just a quick example (in R) to illustrate this equation:

set.seed(12873)

x1 <- rnorm(20)
x2 <- .1*x1 + rnorm(20)
y  <- .8*x1 + .2*x2 + rnorm(20)

summary(lm(y ~ x1 + x2))$r.square
(cor(x1,y)^2 + cor(x2,y)^2 - 2*cor(x1,y)*cor(x2,y)*cor(x1,x2))/(1-cor(x1,x2)^2)

gives the exact same answer of 0.2928677.

Related Question