It's been more than 2 years that I am working on different time series. I have read on many articles that ACF is used to identify order of MA term, and PACF for AR. There is a thumb rule that for MA, the lag where ACF shuts off suddenly is the order of MA and similarly for PACF and AR.
Here is one of the articles I followed from PennState Eberly College of Science.
My Question is why is it so? For me even ACF can give AR term. I need explanation of thumb rule mentioned above. I am not able to understand thumb rule intuitively/mathematically that why –
Identification of an AR model is often best done with the PACF.
Identification of an MA model is often best done with the ACF rather than the PACF
Please note:- I don't need how but "WHY". 🙂
Best Answer
The quotes are from the link in the OP:
This equation looks like a regression model, as indicated on the linked paged... So what is a possible intuition...
In Chinese whispers or the telephone game as illustrated here
the message gets distorted as it is whispered from person to person, and the sentence is completely new after passing through two people. For instance, at time $t_2$ the message, i.e. "$\color{lime}{\small\text{CC}}$
's pool
", is completely different in meaning from that at $t_o,$ i.e. "CV is cool!
" The "correlation" that existed with $t_1$ ("$\color{lime}{\small\text{CC}}$ is cool!") in the word "$\color{lime}{\small\text{CC}}$" is gone; there are no remaining identical words, and even the intonation ("!") has changed.This pattern repeats itself: there is a word shared at any given two consecutive time stamps, which goes away if $t_k$ is compared to $t_{k-2}.$
However, in this process of introducing errors at each step there is a similarity that spans further than just one single step: Although
Chrisy's pool
is different in meaning toCC is cool!
, there is no denying their phonetic similarities or the rhyming of "pool" and "cool". Therefore it wouldn't be true that the correlation stops at $t_{k-1}.$ It does decay (exponentially) but it can be traced downstream for a long time: compare $t_5$ (Missi's cruel
) to $t_0$ (CV is cool!
) - there are still similarities.This explains the correlogram (ACF) in an AR($1$) processes (e.g. with coefficient $0.8$):
Multiple, progressively offset sequences are correlated, discarding any contribution of the intermediate steps. This would be the graph of the operations involved:
In this setting the PACF is useful in showing that once the effect of $t_{k-1}$ is controlled for, older timestamps than $t_{k-1}$ do not explain any of the remaining variance: all that remains is white noise:
It is not difficult to come very close to the actual output of the R function by actually obtaining consecutive OLS regressions through the origin of farther lagged sequences, and collecting the coefficients into a vector. Schematically,
It turns out that the behavior of the ACF and the PACF are flipped compared to AR processes:
In the game above, $t_{k-1}$ was enough to explain all prior errors in transmitting the message (single significan bar in PACF plot), absorbing all prior errors, which had shaped the final message one error at a time. An alternative view of that AR($1$) process is as the addition of a long series of correlated mistakes (Koyck transformation), an MA($\infty$). Likewise, with some conditions, an MA($1$) process can be inverted into an AR($\infty$) process.
$$x_t = - \theta x_{t-1} - \theta^2 x_{t-2} - \theta^3 x_{t-3}+\cdots +\epsilon_t$$
The confusing part then is why the significant spikes in the ACF stop after the number of lags in MA($q$). But in an MA($1$) process the covariance is different from zero only at consecutive times $\small \text{Cov}(X_t,X_{t-1})=\theta \sigma^2,$ because only then the expansion $\small {\text{Cov}}(\epsilon_t + \theta \epsilon_{t-1}, \epsilon_{t-1} + \theta \epsilon_{t_2})=\theta \text{Cov}(\epsilon_{t-1}, \epsilon_{t-1})$ will result in a match in time stamps - all other combinations will be zero due to iid condition.
This is the reason why the ACF plot is helpful in indicating the number of lags, as in this MA($1$) process $\epsilon_t + 0.8 \epsilon_{t-1}$, in which only one lag shows significant correlation, and the PACF shows typical oscillating values that progressively decay:
In the game of whispers, the error at $t_2$ (
pool
) is "correlated" with the value at $t_3$ (Chrissy's pool
); however, there is no "correlation" between $t_3$ and the error at $t_1$ (CC
).Applying a PACF to a MA process will not result in "shut offs", but rather a progressive decay: controlling for the explanatory contribution of later random variables in the process does not render more distant ones insignificant as it was the case in AR processes.