I have multiple samples of a time series (for example, the time series might be minutely samples from 12am to 3pm, and I have that for ten different days) and I'd like to compute the autocorrelation function $\rho(k)$ together with confidence intervals.
I can think of two "obvious" things to do:
- Chain all of the samples together end-to-end and compute the autocorrelation function and confidence intervals for the combined sample.
- Compute the autocorrelation function for each sample individually. Average the values pointwise to get the total autocorrelation function, and apply a square root rule to get the confidence intervals.
Both of these have disadvantages. Option (1) will have artifacts from where I have joined together the time series, which will become more import as I compute $\rho(k)$ for large $k$. Option (2) seems too ad-hoc – I wouldn't know whether to believe my confidence intervals.
Is there a canonical or correct way to do this?
Best Answer
Yes, there is a correct way and it's simple, too.
By definition, the autocorrelation of a stationary process $X_t$ at lag $dt$ is the correlation between $X_t$ and $X_{t+dt}$. Suppose you have observations of this process $x_{t_0}, x_{t_0+dt}, x_{t_0+2dt}, \ldots, x_{t_0+k_0dt}$ at lag $dt$, another set of observations in a non-overlapping time interval $x_{t_1}, x_{t_1+dt}, x_{t_1+2dt}, \ldots, x_{t_1+k_1dt}$ at lag $dt$ for $t_1 \gt t_0+d_0dt$, and in general you have contiguous observations of samples $x_{t_i}, x_{t_i+dt}, x_{t_i+2dt}, \ldots, x_{t_i+k_idt}$, $i=0, 1, \ldots$ for non-overlapping time intervals. Then the correlation coefficient of the ordered pairs
$$\{(x_{t_i+jdt}, x_{t_i+(j+1)dt})\}$$
for $i=0, 1, \ldots$ and $j=0, 1, k_i-1$ estimates the autocorrelation of $x_t$ at lag $dt$. Compute the standard errors of the correlation exactly as you would compute the standard error for the correlation of any bivariate data set $\{(x_k, y_k)\}$.
The difference between this approach and the one proposed in the question is that pairs spanning two sequences, $(x_{t_j+k_jdt}, x_{t_{j+1}})$, are not included in the calculation. Intuitively they should not be, because in general the time interval between these pairs is not equal to $dt$ and therefore such pairs do not provide direct information about the correlation at lag $dt$.