The problem is not the normalisation constant, since in correlation formula it simply cancels out. The difference arises because means and variances of the series are held fixed when calculating the cross-correlations. This means that variance and means are calculated for the whole series, and they are used in calculating correlation when the length of series decreases due to lags. This is a perfectly valid operation if the series are considered stationary, i.e. with constant mean and variance.
Here is the detailed example which recreates the behaviour of ccf
:
x = c(1,2,3,4,5,6,7,8,9,10)
y = c(3,3,3,5,5,5,5,7,7,11)
mx <- mean(x)
my <- mean(y)
dx <- mean((x-mx)^2)
dy <- mean((y-my)^2)
nx <- length(x)
round(cor(x,y),3)
[1] 0.896
cr<-function(x,y,mux=mean(x),muy=mean(y),dx=var(x),dy=var(y),n=length(x)) {
cxy<-sum((x-mux)*(y-muy))/n
cxy/sqrt(dx*dy)
}
round(cr(x,y,mx,my,dx,dy,nx),3)
[1] 0.896
# Think "Lag -1"
# x[-10] = 1,2,3,4,5,6,7,8,9
# y[-1] = 3,3,5,5,5,5,7,7,11
round(cor(x[-10],y[-1]),3)
[1] 0.894
round(cr(x[-10],y[-1],mx,my,dx,dy,nx),3)
[1] 0.699
# Think "Lag -2"
# x[-10:-9] = 1,2,3,4,5,6,7,8
# y[-1:-2] = 3,5,5,5,5,7,7,11
round(cor(x[-10:-9],y[-1:-2]),3)
[1] 0.878
round(cr(x[-10:-9],y[-1:-2],mx,my,dx,dy,nx),3)
[1] 0.466
print(ccf(x,y,lag.max=3,plot=FALSE))
Autocorrelations of series ‘X’, by lag
-3 -2 -1 0 1 2 3
0.197 0.466 0.699 0.896 0.436 0.221 -0.018
Note that the norming constant in the function cr
is needed only because it must be the same norming constant used in the variance calculations.
The main reason for the "reversal" you are looking at when you deal with AR and MA processes, is that these processes generally have the property that they are invertible to the form of the other process (so long as the coefficients in the models are within the unit circle). So a finite AR process can be represented as an infinite MA process, and a finite MA process can be represented as an infinite AR process. For a general MA(q) process you have:
$$Z_t = \Bigg( 1 - \sum_{i=1}^q \theta_i B^i \Bigg) \epsilon_t = \prod_{i=1}^q (1 - \tau_i B) \epsilon_t,$$
where $B$ is the backshift operator. If $\max|\tau_i| < 1$ (so that all the coefficients are inside the unit circle) then the process is invertible and we have:
$$\epsilon_t = \prod_{i=1}^q (1 - \tau_i B)^{-1} Z_t = \prod_{i=1}^q \Bigg( \sum_{k=0}^\infty \tau_i^k B^k \Bigg) Z_t.$$
Re-arranging this expression gives the AR($\infty$) process:
$$Z_t = \Bigg[ \prod_{i=1}^q \Bigg( \sum_{k=0}^\infty \tau_i^k B^k \Bigg) -1 \Bigg] Z_t + \epsilon_t.$$
Now, the PACF is giving you the conditional correlation for a given lag, conditional on knowledge of the values of the intervening times. For an AR process, this measures the autocorrelations in the process. Hence, for an invertible MA process, the PACF will measure the autocorrelations in the AR($\infty$) process that corresponds to that process. The measured PACF values will decay gradually because the AR process being measured is infinite.
Best Answer
Your second paragraph, in a sense, hints at a answer to the first. In time series processes, where you are at point $t$ is partly dependent on where you were just recently, at point $t-1$. Observations in a time series are not independent in most cases. Whether your behaviour is driven or chaotic, you cannot just easily escape the position you were at immediately preceding time $t-1$. If a second ago you were in your kitchen you can't find yourself next moment in any place of you home with equal probability: you are likely to be somewhere still around your kitchen. Time series has "sliding memory". No wonder that it almost always has considerable autocorrelation with itself by lag 1.
Greater lag usually relaxes the autocorrelation since "memory" for the past vanishes. But if the behaviour is cyclic to some extent, with period $p$, you find yourself at $t$ close to where you were at $t-p$. Thus, the autocorrelation with lag $p$ will be strong enough, stronger than with lag $p-1$ or lag $p+1$. Autocorrelograms (ACF) and partial autocorrelograms (PACF) are the main tools to detect autocorrelations with various lags.
Cross-correlations are similar to autocorrelations, only here a time series is correlated not with itself (with a lag) but with some parallel time process (with a lag, lag might be 0).
Auto/cross correlations extend from time series to any series. A series of observations is their sequence. Are observations not random in that they are tied in a chain or in rings? Examining autocorrelations may discover it.