Solved – Does high correlation coefficient mean anything

correlation

By correlation coefficient, I am referring to Pearson product-moment correlation coefficient here.

We all know that correlation doesn't imply causation, but does high correlation coefficient mean anything? The reason I ask this is that if one looks hard enough, one can find all sorts of correlation between any set of data in stock market ("torture the data until it confesses"), so I now think that high correlation coefficient doesn't mean anything at all.

Except for the situation whereby we can trace the causation between two matters, is there any other situation whereby high correlation coefficient means anything?

Take a specific example, if I see that stock A and stock B have a perfect correlation coefficient ( after extensive data mining) and I can't find out the reason why, or any causation between them, and when stock A rises, should I conclude (with high percentage of confidence level) that stock B will also rise? As far as stock B is concerned, what inference I can draw from the rise or fall of Stock A price?

Best Answer

You seem to be conflating two thing:

1) What does correlation mean?

2) Can data mining and other issues mess this up?

Correlation between two variables means that the two variables are correlated: One tends to be higher when the other is higher and lower when the other is lower. Correlation may be due to some third variable, or it may not. It may be due to an outlier, or it may not, etc.

Correlation of time series (like your two stocks) is often due to a 3rd variable: Time. Stock prices tend to move in sync with each other.

And, if you "extensively data mine" then even random noise will produce some very strong correlations. If you look at, say, the correlations of 1000 stocks with each other, then you have 1000*999/500 correlations. You can see how many would be (say) above .9, even if all the prices were utterly from knowledge of the correlation coefficient's properties (standard error) or from simulation.

But if you look at those 500,000 correlations, you will see that they don't behave exactly like the random ones: they tend to be positive.

Related Question