Solved – Auto.arima with daily data: how to capture seasonality/periodicity

arimarseasonalitytime series

I am fitting an ARIMA model on a daily time series.
Data are collected daily from 02-01-2010 to 30-07-2011 and are about newspaper sales.
Since a weekly pattern in sales can be found (the daily average amount of copies sold is usually the same from Monday to Friday, then increases on Saturday and Sunday), I am trying to capture this "seasonality".
Given the sales data "data", I create the time series as follows:

salests<-ts(data,start=c(2010,1),frequency=365)

and then I use the auto.arima(.) function to select the best ARIMA model via AIC criterion. The result is always a non-seasonal ARIMA model, but if I try some SARIMAs model with the following syntax as example:

sarima1<-arima(salests, order = c(2,1,2), seasonal = list(order = c(1, 0, 1), period = 7))

I can obtain better results.
Is there anything wrongs in the ts command / arima specification? The weekly pattern is very strong so I would not expect so many difficulties in capturing it.
Any help would be very useful.
Thank you,
Giulia Deppieri

Update:

I have already changed some arguments. More precisely, the procedure selects ARIMA(4,1,3) as the best model when I set D=7, but AIC and the others good of fit indexes and forecasts as well) do not improve at all. I guess there's some mistakes due to confusion between seasonality and periodicity..?!

Auto.arima call used and output obtained:

modArima<-auto.arima(salests,D=7,max.P = 5, max.Q = 5)



 ARIMA(2,1,2) with drift         : 1e+20
 ARIMA(0,1,0) with drift         : 5265.543
 ARIMA(1,1,0) with drift         : 5182.772
 ARIMA(0,1,1) with drift         : 1e+20
 ARIMA(2,1,0) with drift         : 5137.279
 ARIMA(2,1,1) with drift         : 1e+20
 ARIMA(3,1,1) with drift         : 1e+20
 ARIMA(2,1,0)                    : 5135.382
 ARIMA(1,1,0)                    : 5180.817
 ARIMA(3,1,0)                    : 5117.714
 ARIMA(3,1,1)                    : 1e+20
 ARIMA(4,1,1)                    : 5045.236
 ARIMA(4,1,1) with drift         : 5040.53
 ARIMA(5,1,1) with drift         : 1e+20
 ARIMA(4,1,0) with drift         : 5112.614
 ARIMA(4,1,2) with drift         : 4953.417
 ARIMA(5,1,3) with drift         : 1e+20
 ARIMA(4,1,2)                    : 4960.516
 ARIMA(3,1,2) with drift         : 1e+20
 ARIMA(5,1,2) with drift         : 1e+20
 ARIMA(4,1,3) with drift         : 4868.669
 ARIMA(5,1,4) with drift         : 1e+20
 ARIMA(4,1,3)                    : 4870.92
 ARIMA(3,1,3) with drift         : 1e+20
 ARIMA(4,1,4) with drift         : 4874.095

 Best model: ARIMA(4,1,3) with drift        

So I assume the arima function should be used as:

bestOrder <- cbind(modArima$arma[1],modArima$arma[5],modArima$arma[2])
sarima1<-arima(salests, order = c(4,1,3))

with no seasonal component parameters and period specifications.
Data and exploratory analysis show that the same weekly pattern can be approximatively considered for each week, with the only exception of August 2010 (when a consistent increase in sales is registered). Unfortunately I have no expertise in timeseries modeling at all, in fact I am trying this approach in order to find an alternative solution to other parametric e non-parametric models I have tried to fit for these problematic data.
I have also many dependent numeric variables but they have shown low power in explaining the response variable: undoubtedly, the most difficult part to model is the time component. Moreover, the construction of dummy variables to represent months and weekdays turned out not to be a robust solution.

Best Answer

If there is weekly seasonality, set the seasonal period to 7.

salests <- ts(data,start=2010,frequency=7) 
modArima <- auto.arima(salests)

Note that the selection of seasonal differencing was not very good in auto.arima() until very recently. If you are using v2.xx of the forecast package, set D=1 in the call to auto.arima() to force seasonal differencing. If you are using v3.xx of the forecast package, the automatic selection of D works much better (using an OCSB test instead of a CH test).

Don't try to compare the AIC for models with different levels of differencing. They are not directly comparable. You can only reliably compare the AIC with models having the same orders of differencing.

You don't need to re-fit the model after calling auto.arima(). It will return an Arima object, just as if you had called arima() with the selected model order.