Can somebody explain autocorrelation function in a time series data? Applying acf to the data, what would be the application?
Solved – autocorrelation function
autocorrelation
Related Solutions
this acf suggests non-stationarity which might be remedied by incorporating a daily effect as it appears to evidence structure at lag 24. The daily effect could be either auto-regressive of order 24 or it might be deterministic where 23 hourly dummies might be needed. You could try either of these and assess the results. Further structure appears to be needed. This could be either the need to include level shifts or some form of short-term auto-regressive structure like a differncing operator of lag 1. After identifying and estimating a useful mode, the residuals might suggest further action (model augmentation)to ensure that the signal has fully extracted all information and rendered a noise process that is normal or Gaussian. This will then answer your vague question regarding "stability". Hope this helps !
A slight addition !
The word "suggests" is used as the acf is not the final word on this while the actual data is. In the absence of the actual data the acf is sometimes useful in characterizing the process.
The main reason for the "reversal" you are looking at when you deal with AR and MA processes, is that these processes generally have the property that they are invertible to the form of the other process (so long as the coefficients in the models are within the unit circle). So a finite AR process can be represented as an infinite MA process, and a finite MA process can be represented as an infinite AR process. For a general MA(q) process you have:
$$Z_t = \Bigg( 1 - \sum_{i=1}^q \theta_i B^i \Bigg) \epsilon_t = \prod_{i=1}^q (1 - \tau_i B) \epsilon_t,$$
where $B$ is the backshift operator. If $\max|\tau_i| < 1$ (so that all the coefficients are inside the unit circle) then the process is invertible and we have:
$$\epsilon_t = \prod_{i=1}^q (1 - \tau_i B)^{-1} Z_t = \prod_{i=1}^q \Bigg( \sum_{k=0}^\infty \tau_i^k B^k \Bigg) Z_t.$$
Re-arranging this expression gives the AR($\infty$) process:
$$Z_t = \Bigg[ \prod_{i=1}^q \Bigg( \sum_{k=0}^\infty \tau_i^k B^k \Bigg) -1 \Bigg] Z_t + \epsilon_t.$$
Now, the PACF is giving you the conditional correlation for a given lag, conditional on knowledge of the values of the intervening times. For an AR process, this measures the autocorrelations in the process. Hence, for an invertible MA process, the PACF will measure the autocorrelations in the AR($\infty$) process that corresponds to that process. The measured PACF values will decay gradually because the AR process being measured is infinite.
Best Answer
Unlike regular sampling data, time-series data are ordered. Therefore, there is extra information about your sample that you could take advantage of, if there are useful temporal patterns. The autocorrelation function is one of the tools used to find patterns in the data. Specifically, the autocorrelation function tells you the correlation between points separated by various time lags. As an example, here are some possible acf function values for a series with discrete time periods:
The notation is ACF(n=number of time periods between points)=correlation between points separated by n time periods. Ill give examples for the first few values of n.
ACF(0)=1 (all data are perfectly correlated with themselves), ACF(1)=.9 (the correlation between a point and the next point is 0.9), ACF(2)=.4 (the correlation between a point and a point two time steps ahead is 0.4)...etc.
So, the ACF tells you how correlated points are with each other, based on how many time steps they are separated by. That is the gist of autocorrelation, it is how correlated past data points are to future data points, for different values of the time separation. Typically, you'd expect the autocorrelation function to fall towards 0 as points become more separated (i.e. n becomes large in the above notation) because its generally harder to forecast further into the future from a given set of data. This is not a rule, but is typical.
Now, to the second part...why do we care? The ACF and its sister function, the partial autocorrelation function (more on this in a bit), are used in the Box-Jenkins/ARIMA modeling approach to determine how past and future data points are related in a time series. The partial autocorrelation function (PACF) can be thought of as the correlation between two points that are separated by some number of periods n, BUT with the effect of the intervening correlations removed. This is important because lets say that in reality, each data point is only directly correlated with the NEXT data point, and none other. However, it will APPEAR as if the current point is correlated with points further into the future, but only due to a "chain reaction" type effect, i.e., T1 is directly correlated with T2 which is directly correlated with T3, so it LOOKs like T1 is directly correlated with T3. The PACF will remove the intervening correlation with T2 so you can better discern patterns. A nice intro to this is here.
The NIST Engineering Statistics handbook, online, also has a chapter on this and an example time series analysis using autocorrelation and partial autocorrelation. I won't reproduce it here, but go through it and you should have a much better understanding of autocorrelation.