Solved – Interpreting autocorrelation in time series residuals

arimaautocorrelationresidualstime series

I am trying to fit an ARIMA model to the following data minus the last 12 datapoints: http://users.stat.umn.edu/~kb/classes/5932/data/beer.txt

The first thing I did was take the log-difference to get a stationary process:

beer_ts <- bshort %>%pull(Amount) %>%ts(.,frequency=1)
beer_log <- log(beer_ts)
beer_adj <- diff(beer_log, differences=1)

.
.

Both the ADF & KPSS tests indicates that beer_adj is indeed stationary, which matches my visual inspection:
enter image description here
.
.

Fitting a model yields the following:

beer_arima <- auto.arima(beer_adj, seasonal=FALSE, stepwise=FALSE, approximation=FALSE)
summary(beer_arima)

Series: beer_adj 
ARIMA(4,0,1) with non-zero mean 

Coefficients:
         ar1     ar2     ar3      ar4      ma1    mean
      0.4655  0.0537  0.0512  -0.3486  -0.9360  0.0017
s.e.  0.0470  0.0517  0.0516   0.0469   0.0126  0.0005

sigma^2 estimated as 0.01282:  log likelihood=312.39
AIC=-610.77   AICc=-610.5   BIC=-582.68

Training set error measures:
                        ME      RMSE        MAE      MPE     MAPE     MASE        ACF1
Training set -0.0004487926 0.1124111 0.08945575 134.5103 234.3433 0.529803 -0.06818126

.
.

When inspecting the residuals, however, they seem to be serially autocorrelated at lag 12:
enter image description here

I am unsure how I am supposed to interpret this last plot – at this stage I would have expected white noise residuals. I am new to time series analysis, but it seems to me that I am missing some fundamental step in my modelling here. Any pointers would be greatly appreciated!

Best Answer

If I am not mistaken, the observations in this beer time series were collected 12 times a year for each year represented in the data, so the frequency of the time series is 12.

As explained at https://www.statmethods.net/advstats/timeseries.html, for instance, frequency is the number of observations per unit time. This means that frequency = 1 for data collected once a year, frequency = 4 for data collected 4 times a year (i.e., quarterly data) and frequency = 12 for data collected 12 times a year (i.e., monthly data). Not sure why you would use frequency = 1 instead of frequency = 12 for this time series?

For time series data where frequency = 4 or 12, you should be concerned about the possibility of seasonality (see https://anomaly.io/seasonal-trend-decomposition-in-r/).

If seasonality is present, you should incorporate this into your time series modelling and forecasting, in which case you would use seasonal = TRUE in your auto.arima() function call.

Before you construct the ACF and PACF plots, you can diagnose the presence of seasonality of a quarterly or monthly time series using functions such as ggseasonplot() and ggsubseriesplot() in the forecast package in R, as seen here: https://otexts.org/fpp2/seasonal-plots.html.

Related Question