Solved – Seasonality not taken account of in `auto.arima()`

arimaforecastingrseasonalitytime series

I am having basically the same issue than in this thread, except one thing:

The difference, in my case, is that my data is measured weekly and not daily, so the argument of a too high seasonality (> 350) does not hold for my data, since the seasonality in my case is 52 (52 weeks in a year).

And yet, when I use auto.arima(), R returns the ARIMA model (p,d,q) = (2,1,1) and (P,D,Q) = (0,0,0), while the seasonal pattern in my data is blatant… How could you explain that R completely dismisses the seasonality in my data?

Since I'm still in a learning phase, I am using the data set cmort available in the astsa library, so everyone here can use the same data as me.

And I have done cmort <- ts(cmort,frequency=52) to be sure that the seasonality in my data is taken account of, but it didn't change anything.

Best Answer

(First off, cmort is already a ts object with frequency 52, so you don't need to coerce it.)

I'd say seasonality is visible, not that it is blatant:

library(forecast)
library(astsa)
seasonplot(cmort)

Per the help page (?auto.arima), auto.arima() decides whether or not to take seasonal differences by using a OCSB test. It's quite possible that this test simply got it wrong in this instance; it's a statistical test, after all. You can force a seasonal model by setting D=1, although auto.arima() runs for quite some time with forced seasonality. (Note that the information criteria are not comparable between the original and the differenced series.)

Auto-fitted model:

> auto.arima(cmort)
Series: cmort 
ARIMA(2,1,1)                    

Coefficients:
         ar1     ar2      ma1
      0.0957  0.2515  -0.6435
s.e.  0.4302  0.2444   0.4155

sigma^2 estimated as 33.72:  log likelihood=-1609.89
AIC=3227.77   AICc=3227.85   BIC=3244.68

Model with forced seasonality:

> auto.arima(cmort,D=1)
Series: cmort 
ARIMA(0,0,0)(1,1,0)[52] with drift         

Coefficients:
         sar1    drift
      -0.5737  -0.0257
s.e.   0.0378   0.0041

sigma^2 estimated as 47.7:  log likelihood=-1537.6
AIC=3081.21   AICc=3081.26   BIC=3093.57

Related Solutions

Solved – Auto.arima with daily data: how to capture seasonality/periodicity

If there is weekly seasonality, set the seasonal period to 7.

salests <- ts(data,start=2010,frequency=7) 
modArima <- auto.arima(salests)

Note that the selection of seasonal differencing was not very good in auto.arima() until very recently. If you are using v2.xx of the forecast package, set D=1 in the call to auto.arima() to force seasonal differencing. If you are using v3.xx of the forecast package, the automatic selection of D works much better (using an OCSB test instead of a CH test).

Don't try to compare the AIC for models with different levels of differencing. They are not directly comparable. You can only reliably compare the AIC with models having the same orders of differencing.

You don't need to re-fit the model after calling auto.arima(). It will return an Arima object, just as if you had called arima() with the selected model order.

Solved – Daily Time Series Analysis

Your ACF and PACF indicate that you at least have weekly seasonality, which is shown by the peaks at lags 7, 14, 21 and so forth.

You may also have yearly seasonality, although it's not obvious from your time series.

Your best bet, given potentially multiple seasonalities, may be a tbats model, which explicitly models multiple types of seasonality. Load the forecast package:

library(forecast)

Your output from str(x) indicates that x does not yet carry information about potentially having multiple seasonalities. Look at ?tbats, and compare the output of str(taylor). Assign the seasonalities:

x.msts <- msts(x,seasonal.periods=c(7,365.25))

Now you can fit a tbats model. (Be patient, this may take a while.)

model <- tbats(x.msts)

Finally, you can forecast and plot:

plot(forecast(model,h=100))

You should not use arima() or auto.arima(), since these can only handle a single type of seasonality: either weekly or yearly. Don't ask me what auto.arima() would do on your data. It may pick one of the seasonalities, or it may disregard them altogether.

EDIT to answer additional questions from a comment:

How can I check whether the data has a yearly seasonality or not? Can I create another series of total number of events per month and use its ACF to decide this?

Calculating a model on monthly data might be a possibility. Then you could, e.g., compare AICs between models with and without seasonality.

However, I'd rather use a holdout sample to assess forecasting models. Hold out the last 100 data points. Fit a model with yearly and weekly seasonality to the rest of the data (like above), then fit one with only weekly seasonality, e.g., using auto.arima() on a ts with frequency=7. Forecast using both models into the holdout period. Check which one has a lower error, using MAE, MSE or whatever is most relevant to your loss function. If there is little difference between errors, go with the simpler model; otherwise, use the one with the lower error.

The proof of the pudding is in the eating, and the proof of the time series model is in the forecasting.

To improve matters, don't use a single holdout sample (which may be misleading, given the uptick at the end of your series), but use rolling origin forecasts, which is also known as "time series cross-validation". (I very much recommend that entire free online forecasting textbook.

So Seasonal ARIMA models cannot usually handle multiple seasonalities? Is it a property of the model itself or is it just the way the functions in R are written?

Standard ARIMA models handle seasonality by seasonal differencing. For seasonal monthly data, you would not model the raw time series, but the time series of differences between March 2015 and March 2014, between February 2015 and February 2014 and so forth. (To get forecasts on the original scale, you'd of course need to undifference again.)

There is no immediately obvious way to extend this idea to multiple seasonalities.

Of course, you can do something using ARIMAX, e.g., by including monthly dummies to model the yearly seasonality, then model residuals using weekly seasonal ARIMA. If you want to do this in R, use ts(x,frequency=7), create a matrix of monthly dummies and feed that into the xreg parameter of auto.arima().

I don't recall any publication that specifically extends ARIMA to multiple seasonalities, although I'm sure somebody has done something along the lines in my previous paragraph.

Best Answer

Related Solutions

Solved – Auto.arima with daily data: how to capture seasonality/periodicity

Solved – Daily Time Series Analysis

Related Question