Solved – Finding Correlation between Time Series – is it a meaningless value

correlationcross correlationtime series

I want to measure the relationship between pairs of time series over different time periods. I've been looking into correlation of time series and it seems to me that there isn't much point in finding the correlation between pairs of time series due to a number of issues (such as potential trends etc.). Is finding the correlation between two time series pointless, or is it still a decent indicator for the relationship between how the two time series move?

I've tried looking into cross-correlation. However ,that seems to require the two time series to be stationary – the problem is that I have more than 80 different time series per year, so having to look over various plots to check for stationarity isn't realistic so I instead decided to just settle with correlation.

Here's what I did to find the correlation between pairs of time series: Each time series had two columns, a date and a number which showed the inventory for an item in a shop. Different time series were for different items in the shop. Many of the time series had varying lengths: one series will have 200 days in a year while another will only have 30. I first found the days the two time series had in common and I then grouped these values/days together (for example if both items had an inventory on 02/03/2016 then this would go into the new subset) and then found the correlation between the inventory of the two different items.

Is this method correct?

EDIT: I've just been reading more into correlation between time series and I was wondering if someone could check this for me. According to this topic: https://quant.stackexchange.com/questions/489/correlation-between-prices-or-returns
it looks like it would be better for me to find the correlation between two time series if I instead look at the variation of the inventory of the items rather than looking at both the time series normally. Would that be better or would it still be bad idea for me to do?

2nd Edit: Alternatively, what if I tried to find the correlation of the cumulative sums for both the time series? Would that be okay to do instead?

Best Answer

Traditional correlation measurements between two time series will not tell you much.

As an example, let's take the issue of height across both cross-sectional and time series data.

Cross-sectional example: Measuring the correlation coefficient of height for a sample of 100 21 year old British and Dutch males.

Time series example: Measuring the correlation coefficient of 100 males each year from age 4-21.

In the time series example, you will find that your correlation is highly significant (since growth from 4-18 will continue regardless of the eventual height of each male in the sample).

However, the correlation will be skewed upwards due to the time series trend. Therefore, one cannot interpret any insightful meaning from such a correlation coefficient. With cross-sectional data, the correlation coefficient will be more meaningful since a time trend will not bias the correlation reading to the upside.

Cointegration, on the other hand, allows one to determine whether the correlation is significant or simply due to chance.

To run this in R, you would use the egcm command as follows:

library(egcm)

egcm(x,y)

This will produce the relevant t-statistic which will indicate whether the two time series are cointegrated or not. This would be a recommended method for analysing the correlations (or lack thereof) for your first 30 days of data. Needless to say, one cannot calculate correlations for time series with varying observations.

Related Question