Let’s say you are trying to find if there is a correlation between two stock prices, where both are likely non stationary series. You have no concern as it relates to a potentially causal relationship…
You run a simple correlation analysis against all the rules. Both our series are autocorrelated and non stationary. You find there is a 98% correlation so you conclude they depend on each other.
This is the conversation I just had with a colleague… but I think they are 100% wrong and I’d like some confirmation.
If you find two autocorrelated and non stationary series to be 98% correlated, then the correlation is likely spurious. What this means to me is that the correlation we observe is likely due to complete chance (and their correlation is likely a result of their mutual dependence on something else outside of the two series themselves). So if our goal is to identify the extent to which these two series “depend” on each other, finding a valid correlation coefficient is necessary. Is this correct?
Best Answer
Here's a simulated example of two prices that are very highly correlated ($\rho = 0.9875$). When you attempt to predict the price change in one using the lagged value of the other, very little of the variation in the price change is explainable:
Here FD is the first difference of subsequent value (so $FD.p_t = (p_{t+1}-p_t)$).
The $R^2$ (aka R-squared) of both models is around zero, so very little of the variation in price changes tomorrow can be explained by the price today. This illustrates the intuition that knowing what you know today, you cannot act on this correlation to make money tomorrow.
You can play around with variations on this approach (using the lagged price change as a predictor, non-linear models, adding more data, more noise, or adding trends), with identical results.
You might object that my toy example is flawed because the high correlation is contemporaneous, so if you knew p1 today, you could predict p2 today. I think that is wrong for the following reason. Suppose the DGP is as above, but unknown to you. You are an executive at company 1, and you learn that your CEO had been falsifying earnings and pinching bottoms. The news will become public shortly and lower p1. You can’t short your own stock without a vacation at Club Fed. Should you short the stock of company 2 if you know the correlation between p1 and p2 is ~1? I think that would be a terrible idea. This is what makes the correlation spurious and why that matters.
You could also have a causal relationship, but no correlation. When a house has air-conditioning with a preset desired temperature, there will be a strong positive non-spurious correlation between the amount of electricity used by the AC and the temperature outside. But there will be no correlation between the amount of electricity consumed and the inside temperature. The outside temperature and the inside temperature will also be uncorrelated. The last two are spurious non-correlations in my mind. But all three correlation are valid (though that has no formal definition in statistics) since a correlation is just a transformation of the data.
This is all to say that a strong correlation is not necessary for a causal dependence to exist. And it is certainly not sufficient. Even the sign on the causal relationship could be different from the sign of the correlation. This matters for using correlations to do things out in the real world (i.e., interventions). This is not just an issue with time series data, but can happen with observational data.