Solved – How do we know if the correlation is significant

correlationregression

Suppose that we have continuous data $(X_1,Y_1),\dots,(X_n,Y_n)$. Suppose that $r_{x,y}$ is the Karl-Pearson correlation coefficient between $X_i$'s and $Y_i$'s. For what range of values of $r_{x,y}$, can we really decide that there may indeed be a linear relationship between $X_i$'s and $Y_i$' and proceed to predict $Y$ by using a linear regression?

I'm sure the topic concerning this question should be a well-studied one. I did a little search here; couldn't find relevant posts. Any answers to the above question or pointers to such a study is greatly appreciated.

Best Answer

For what range of values of rx,y, can we [...] proceed to predict Y by using a linear regression?

If the relationship is indeed linear, any value of correlation can work; linear regression behaves as it should across the entire range of correlations, including 0. You don't even need to examine the correlation beforehand (it seems to serve no purpose not already covered by the usual regression calculations).

However, that's a big if. You can get any correlation (except exactly 1 or -1) and not have linearity; a large (magnitude of) correlation doesn't necessarily imply the relationship is actually linear (nor does a small one imply that it isn't); correlation is not of itself a useful way to decide on the suitability of a linear regression model.

In the case of multiple regression, examining bivariate correlations is even more problematic, since the marginal bivariate correlations may be quite different from what you get in a multiple regression model. (See the Wikipedia articles on Simpson's paradox and omitted variable bias, for example.)

However, if you're interested in whether the regression is doing something useful in terms of prediction, we'd need to pin down precisely what is intended by "useful". In some cases that might be attributable to correlation values.

On the other hand, if you're instead asking "how do we perform a hypothesis test of a Pearson correlation?" you should probably edit the question to make that explicit. Under suitable assumptions you get a "standard" test readily available in packages - or fairly easily carried out by hand. [However, you're not limited to those specific assumptions, other tests of a Pearson correlation - including nonparametric tests - are possible.]