Solved – the relationship between correlation coefficients and regression coefficients in multiple regression

correlationregression

Consider the model $ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon$.

  1. What is the relationship between the correlation coefficients $r_{y,x_1}$, $r_{y,x_2}$ and the regression coefficients $\beta_1$ and $\beta_2$?

  2. In particular, how to interpret a situation where a particular correlation coefficient is statistically significant but the corresponding regression coefficient is not statistically significant?

  3. I recall reading a concept called "partial correlation coefficients". Is that in some way relevant in the above context?

Best Answer

Let's assume that the variable $x_1$ and $x_2$ are centered, it will make things easier (nothing prevents you to do that before doing your regression). Then, it is straightforward to see that:

$r_{y, x_1}\sigma_y=\beta_1\sigma_{x_1} + \beta_2r_{x_1, x_2} \sigma_{x_2}$

and

$r_{y, x_2}\sigma_y=\beta_2\sigma_{x_2} + \beta_1r_{x_1, x_2} \sigma_{x_1}$

Hence, the relation also involves standard deviations terms and the correlation between $x_1$ and $x_2$.

This should answer your second question. For example, if $r_{x_1, x_2}=1$ and $x_1 = x_2$, any solution $\beta_1\sigma_{x_1} + \beta_2r_{x_1, x_2} \sigma_{x_2} = \sigma_y$ leads to the same linear model for $y$. As a consequence, $\beta_1$ (or $\beta_2$) can take arbitrary values (which are going to depend on the numerical implementation of the linear regression you are using).

Looking at the value of $r_{x_1, x_2}$ is then critical before making any relation between $r_{y, x_i}$ and $\beta_1, \beta_2$.

Related Question