Solved – Why do we compare sample ACF and theoretical ACF in time series analysis

autocorrelationtime series

Theoretical Autocorrelation Function (ACF):

For a weakly stationary time series {$r_t$}, the definition of ACF is (from Ruey Tsay's "Analysis of Financial Time Series")

$
\rho_l=\frac{Cov(r_t,r_{t-l})}{\sqrt{Var(r_t)Var(r_{t-l})}}=\frac{Cov(r_t,r_{t-l})}{Var(r_t)}
$

It calculates the correlation of two random variables: $r_t$ and $r_{t-l}$

sample ACF calculates the correlation of a time series and a lag $l$ of it, it is two different random variables from $r_t$ and $r_{t-l}$

So what is the point of comparing these two different quantities?

E.g.,
we have calculated the theoretical ACF value between $r_1$ and $r_5$ of a time series, it is actually a random process.

We want to check if the theoretical calculation is good, so we instantiate the random process numerous times. For each instantiation, we pick out the value of $r_1$ and $r_5$. Finally, we obtain samples of random variable $r_1$ and $r_5$. Then we use the samples to calculate the sample ACF between $r_1$ and $r_5$. This is the correct way I believe to calculate the sample ACF, and the value to compare with the theoretical ACF.

In a word, in my opinion, the correlation between a time series and a lag 5 of it, is NOT the correct way of calculating the sample ACF between $r_1$ and $r_5$.
And it is meaningless to compare this value with the theoretical ACF between $r_1$ and $r_5$.

Where am I wrong?

Best Answer

When we have a "theoretical" ACF, what we mean is that it is the ACF that follows logically from some underlying hypothesised model. So when we compare the sample ACF to this theoretical ACF, we are looking at whether they are similar enough that this hypothesised model is plausible. In principle, this is no different than any other kind of comparison between sample data and a hypothesised model outcome --- e.g., comparing a histogram of data to a superimposed bell curve. In all such cases, you are looking to see if there is enough similarity between the empirical outcome and the theoretical outcome that the underlying model for the theoretical outcome is plausible.

In practice, we often have just one vector of time-series data (not multiple "instantiations" of it) and we may wish to know if it is plausibly modelled by some particular time-series model (e.g., an ARMA model). The equation you have given for the ACF is just its definition, and not a particular form for it. However, different kinds of time-series models have different forms for the ACF, and so it is useful to compare the observed sample ACF to these different forms to see if they look like each other. For example, a simple AR(1) time-series model has exponential decay (oscillating if the auto-correlation is negative) in its ACF, so if our sample ACF looks close to exponential decay then the data might be well modelled by an AR(1) model.

As to your objection to the definition for the ACF, I'm afraid I don't follow your argument. This is merely the definition of a particular function used to measure auto-correlation of stationary time-series. (Incidentally, the difference between $r_1$ and $r_5$ would be a lag of four, not five.)