Solved – High Pearson correlation, but very low coefficient in multiple regression analysis

correlationmultiple regressionpearson-rr-squaredregression coefficients

I have been running a few linear regression models to test the absolute and relative effect of several independent variables related to spending/investment on different tools on one measure of performance that I want to increase. I first ranked the Pearson correlation of these independent variables/tools, with some having high correlation and some low ones. Then I ran a linear regression with the dependent and independent variables, it end up that most of ones with high Pearson's also have high coefficient (most of the time also high t-stat) in regression. However, a small number of tools have very high Pearson's but extremely low or non-existent coefficient, meaning their effect is almost not there to be seen – what can be some technical/statistical/mathematical explanation for it?

Best Answer

They are correlated with other variables in the model. When you correlate those variables (say, $x$) with the response ($y$), you are looking at them in isolation. Thus, you are measuring their association, plus the association of all the variables they are correlated with (e.g., $z$). It can be the case that there is no actual correlation between the variable and the response ($r_{x,y}=0$), but high correlations between $z$ and $y$, and between $x$ and $z$. So when you run the univariate correlation, $x$ just acts as a proxy for $z$ and you get a high univariate correlation, but when you include $z$, you see that $x$ was irrelevant.

For more information, see my answer here: Is there a difference between 'controlling for' and 'ignoring' other variables in multiple regression?

Related Question