Solved – “Correlation” terminology in time series analysis

correlationtime series

From basic statistics and hearing "correlation is not causation" all the time, I tend to think it's fine to say that "X and Y are correlated" even if X and Y aren't in a causal relationship. For example, I'd normally think it's perfectly okay to say that ice cream sales and swimsuit sales are correlated, since high swimsuit sales probably means high ice cream sales (even though increases in swimsuit sales don't cause an increase an ice cream sales).

However, when studying time analysis, I get a little confused about this terminology. It seems like a time series analyst would not say that ice cream sales are correlated with swimsuit sales, but rather that ice cream sales are spuriously correlated with swimsuit sales. An unmodified "X is correlated with Y" seems to be reserved for the case where X actually causes Y, so it's fine to say temperature (but not ice cream) is correlated with swimsuit sales.

Is this correct? My problem is that there seem to be two meanings to spurious correlation:

  1. Regress two independent random walks against each other, and ordinary statistical tests will say that they're correlated, even though the two random walks are obviously unrelated in any fashion. (I'm fine with this meaning of spurious correlation, since there really is no relationship.)
  2. Regress ice cream sales against swimsuit sales. It confuses me that this correlation is called spurious, since there really is a relationship between ice cream sales and swimsuit sales, even though this relationship isn't causal.

So I guess my question is: do time series analysts reserve the term "(non-spurious) correlation" for causal relationships — so that for time series analysts, correlation is meant to suggest causation! — while statisticians in general are fine with using "correlation" to indicate any kind of (possibly non-causal) relationship?

Best Answer

In order to avoid the spurious correlation problem, you should regress two stationary time series against one another. This can (potentially) provide a causal story. It is non-stationary series that lead to spurious correlation. See the reasoning given by my answer to this question (As a footnote, you may not need stationary series if they are integrated series, but I'd point you to any of the applied time series books to learn more about that.)