Linear regression correlation coefficient significance vs. importance

correlationregressionstatistical-inferencestatistics

I ran a simple linear regression in Excel between variables x and y.

Pearson’s r is 0.3, and the p-value for the x coefficient is 0.5 – far above my alpha of 5%. Therefore, I conclude the correlation is “insignificant.”

Here’s my confusion – the correlation coefficient for my model is 0.3 so my sample data are in need correlated in that sense. It’s a weak correlation, but a correlation at that. We deem is statistically insignificant because the p-value is too high, indicating we failed to reject the null hypothesis which I assume is “the correlation between population x and population y is 0.”

Doesn’t the 0.3 correlation have some use? I am looking at financial data, so the correlation shows that my sample data are not strongly correlated. My question is this: since the correlation is insignificant, does this mean the 0.3 correlation must be completely disregarded? Is it of no use? If it is of some use, how?

Best Answer

My question is this: since the correlation is insignificant, does this mean the 0.3 correlation must be completely disregarded?

The problem with p.value is that it dependent on the sample size $n$, i.e., the true correlation may be nonzero, however, if you have a noisy data and small $n$, you may fail to reject the null hypothesis. On the other hand, if your sample size is large enough even with a noisy data, the fact that the p.value is $0.5$, is a good indication that there is no linear correlation between you variables. A good reality check may be done by performing nonparametric Bootstrap. I.e., sample $N$ times $n$ pairs of data with replacement, and inspect how the estimated $r$ changes. If your $r$ falls frequently on both sides of the sign (i.e., sometimes positive and sometimes negative) then you may conclude that your point estimator of $0.3$ is merely due to chance and have no practical use.

Related Question