Solved – Pearson correlation coefficient for lagged time series

correlationcross correlationtime series

I am trying to calculate lagged Pearson correlation coefficient between time series. I am not interested in calculating cross-correlation. I want to calculate Pearson correlation coefficient because I want to use the correlation for prediction.

In general, when two variables are strongly correlated, we get a high correlation coefficient. We can fit a straight line in the scatter plot of the two variables and use it for predicting $y$ (based on $x$). In such a case, the variance explained by the fitted line would be equal to the square of the correlation coefficient.

But when the two variables are arranged in a certain lag and then Pearson correlation coefficient is calculated between them, can we still say that the variance explained will be equal to the square of the correlation coefficient? Can we use the best fit line from the lagged scatter plot for prediction?

Best Answer

But when the two variables are arranged in a certain lag and then Pearson correlation coefficient is calculated between them, can we still say that the variance explained will be equal to the square of the correlation coefficient?

Yes, if you consider explaining the variance of $y_t$ by $x_{t-h}$ where $h$ is the lag order. (But not necessarily so if you consider explaining the variance of $y_t$ by $x_t$.)

Can we use the best fit line from the lagged scatter plot for prediction?

Yes. It is actually quite practical because you have the lagged values earlier than you have the contemporaneous values, hence it is natural to use lagged -- rather than contemporaneous -- values for prediction.

I am trying to calculate lagged Pearson correlation coefficient between time series. I am not interested in calculating cross-correlation.

Cross correlation is the Pearson correlation for lagged time series (when one series is lagged with respect to another.)

Also note that correlation is a natural measure for cross-sectional data where the observations can be assumed to be $i.i.d.$, but it is not that natural in the time series setting where there is time dependence between observations. For example, Pearson correlation is not very useful when applied on two series sharing a common deterministic or stochastic trend (see Spurious correlations website for some examples).

Related Question