Solved – autocorrelation function

autocorrelation

Can somebody explain autocorrelation function in a time series data? Applying acf to the data, what would be the application?

Best Answer

Unlike regular sampling data, time-series data are ordered. Therefore, there is extra information about your sample that you could take advantage of, if there are useful temporal patterns. The autocorrelation function is one of the tools used to find patterns in the data. Specifically, the autocorrelation function tells you the correlation between points separated by various time lags. As an example, here are some possible acf function values for a series with discrete time periods:

The notation is ACF(n=number of time periods between points)=correlation between points separated by n time periods. Ill give examples for the first few values of n.

ACF(0)=1 (all data are perfectly correlated with themselves), ACF(1)=.9 (the correlation between a point and the next point is 0.9), ACF(2)=.4 (the correlation between a point and a point two time steps ahead is 0.4)...etc.

So, the ACF tells you how correlated points are with each other, based on how many time steps they are separated by. That is the gist of autocorrelation, it is how correlated past data points are to future data points, for different values of the time separation. Typically, you'd expect the autocorrelation function to fall towards 0 as points become more separated (i.e. n becomes large in the above notation) because its generally harder to forecast further into the future from a given set of data. This is not a rule, but is typical.

Now, to the second part...why do we care? The ACF and its sister function, the partial autocorrelation function (more on this in a bit), are used in the Box-Jenkins/ARIMA modeling approach to determine how past and future data points are related in a time series. The partial autocorrelation function (PACF) can be thought of as the correlation between two points that are separated by some number of periods n, BUT with the effect of the intervening correlations removed. This is important because lets say that in reality, each data point is only directly correlated with the NEXT data point, and none other. However, it will APPEAR as if the current point is correlated with points further into the future, but only due to a "chain reaction" type effect, i.e., T1 is directly correlated with T2 which is directly correlated with T3, so it LOOKs like T1 is directly correlated with T3. The PACF will remove the intervening correlation with T2 so you can better discern patterns. A nice intro to this is here.

The NIST Engineering Statistics handbook, online, also has a chapter on this and an example time series analysis using autocorrelation and partial autocorrelation. I won't reproduce it here, but go through it and you should have a much better understanding of autocorrelation.