Traditional correlation measurements between two time series will not tell you much.
As an example, let's take the issue of height across both cross-sectional and time series data.
Cross-sectional example: Measuring the correlation coefficient of height for a sample of 100 21 year old British and Dutch males.
Time series example: Measuring the correlation coefficient of 100 males each year from age 4-21.
In the time series example, you will find that your correlation is highly significant (since growth from 4-18 will continue regardless of the eventual height of each male in the sample).
However, the correlation will be skewed upwards due to the time series trend. Therefore, one cannot interpret any insightful meaning from such a correlation coefficient. With cross-sectional data, the correlation coefficient will be more meaningful since a time trend will not bias the correlation reading to the upside.
Cointegration, on the other hand, allows one to determine whether the correlation is significant or simply due to chance.
To run this in R, you would use the egcm
command as follows:
library(egcm)
egcm(x,y)
This will produce the relevant t-statistic which will indicate whether the two time series are cointegrated or not. This would be a recommended method for analysing the correlations (or lack thereof) for your first 30 days of data. Needless to say, one cannot calculate correlations for time series with varying observations.
Similarity is the inverse of distance, below are commonly used distance metrics for time series.
- Correlation: You already talk about this.
- Eucledian Distance: Self explanatory I assume.
- Dynamic Time Warping: DTW finds an optimal match between two sequences of feature vectors which allows for stretched and compressed sections of the sequence.
- Mutual Information: Entropy based metric, introduced by Shannon. Applied to time series, there are quite a few papers by now. For example (this)[https://arxiv.org/abs/0904.4753].
- iSAX: The final one I want to flag is the so-called "Motif Discovery" and the related (iSAX)[http://www.cs.ucr.edu/~eamonn/iSAX.pdf] representation of time series (by Eamon Keogh), which is very scalable.
Also, I would recommend searching this website for Time Series Distance metrics, I am sure there will be a few others that I am missing here.
Best Answer
The sample correlation coefficient is
$$\sum (x_i-\bar{x})(y_i-\bar{y}) / \sqrt{\sum (x_i-\bar{x})^2 \sum(y_i-\bar{y})^2}$$
so the best thing would be to save $\sum(x-\bar{x})(y-\bar{y})$, $\sum(x-\bar{x})^2$ and $\sum(y-\bar{y})^2$ for the different windows. These could easily be combined, even with different sample sizes in the different windows.
If the denominators are approximately constant across 90 day increments, then you could just take the average of the correlations.
I'm surprised you say that this is computationally expensive. I wouldn't think it would take much more time than calculating the sample mean.