Time Series – Analyzing Autocorrelation from Multiple Time Series Samples

autocorrelationtime series

I have multiple samples of a time series (for example, the time series might be minutely samples from 12am to 3pm, and I have that for ten different days) and I'd like to compute the autocorrelation function $\rho(k)$ together with confidence intervals.

I can think of two "obvious" things to do:

  1. Chain all of the samples together end-to-end and compute the autocorrelation function and confidence intervals for the combined sample.
  2. Compute the autocorrelation function for each sample individually. Average the values pointwise to get the total autocorrelation function, and apply a square root rule to get the confidence intervals.

Both of these have disadvantages. Option (1) will have artifacts from where I have joined together the time series, which will become more import as I compute $\rho(k)$ for large $k$. Option (2) seems too ad-hoc – I wouldn't know whether to believe my confidence intervals.

Is there a canonical or correct way to do this?

Best Answer

Yes, there is a correct way and it's simple, too.

By definition, the autocorrelation of a stationary process $X_t$ at lag $dt$ is the correlation between $X_t$ and $X_{t+dt}$. Suppose you have observations of this process $x_{t_0}, x_{t_0+dt}, x_{t_0+2dt}, \ldots, x_{t_0+k_0dt}$ at lag $dt$, another set of observations in a non-overlapping time interval $x_{t_1}, x_{t_1+dt}, x_{t_1+2dt}, \ldots, x_{t_1+k_1dt}$ at lag $dt$ for $t_1 \gt t_0+d_0dt$, and in general you have contiguous observations of samples $x_{t_i}, x_{t_i+dt}, x_{t_i+2dt}, \ldots, x_{t_i+k_idt}$, $i=0, 1, \ldots$ for non-overlapping time intervals. Then the correlation coefficient of the ordered pairs

$$\{(x_{t_i+jdt}, x_{t_i+(j+1)dt})\}$$

for $i=0, 1, \ldots$ and $j=0, 1, k_i-1$ estimates the autocorrelation of $x_t$ at lag $dt$. Compute the standard errors of the correlation exactly as you would compute the standard error for the correlation of any bivariate data set $\{(x_k, y_k)\}$.

The difference between this approach and the one proposed in the question is that pairs spanning two sequences, $(x_{t_j+k_jdt}, x_{t_{j+1}})$, are not included in the calculation. Intuitively they should not be, because in general the time interval between these pairs is not equal to $dt$ and therefore such pairs do not provide direct information about the correlation at lag $dt$.