If there is weekly seasonality, set the seasonal period to 7.
salests <- ts(data,start=2010,frequency=7)
modArima <- auto.arima(salests)
Note that the selection of seasonal differencing was not very good in auto.arima()
until very recently. If you are using v2.xx of the forecast
package, set D=1
in the call to auto.arima()
to force seasonal differencing. If you are using v3.xx of the forecast
package, the automatic selection of D
works much better (using an OCSB test instead of a CH test).
Don't try to compare the AIC for models with different levels of differencing. They are not directly comparable. You can only reliably compare the AIC with models having the same orders of differencing.
You don't need to re-fit the model after calling auto.arima()
. It will return an Arima object, just as if you had called arima()
with the selected model order.
- How do I select the best ARIMA model (by trying all different orders and checking the best MASE/MAPE/MSE? where the selection of performance measurement can be a discussion in it's own..)
Out of sample risk estimates are the gold standard for performance evaluation, and therefore for model selection. Ideally, you cross-validate so that your risk estimates are averaged over more data. FPP explains one cross-validation method for time series. See Tashman for a review of other methods:
Tashman, L. J. (2000). Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting, 16(4), 437–450. doi:10.1016/S0169-2070(00)00065-0
Of course, cross-validation is time consuming and so people often resort to using in-sample criteria to select a model, such as AIC, which is how auto.arima selects the best model. This approach is perfectly valid, if perhaps not as optimal.
- If I generate a new model and forecast for every new day forecast (as in online forecasting), do I need to take the yearly trend into account and how? (as in such a small subset my guess would be that the trend is neglible)
I'm not sure what you mean by yearly trend. Assuming you mean yearly seasonality, there's not really any way to take it into account with less than a year's worth of data.
- Would you expect that the model order stays the same throughout the dataset, i.e. when taking another subset will that give me the same model?
I would expect that barring some change to how the data are generated, the most correct underlying model will be the same throughout the dataset. However, that's not the same as saying that the model selected by any procedure (such as the procedure used by auto.arima) will be the same if that procedure is applied to different subsets of the data. This is because the variability due to sampling will result in variability in the results of the model selection procedure.
- What is a good way, within this method to cope with holidays? Or is ARIMAX with external holiday dummies needed for this?
External holiday dummies is the best approach.
- Do I need to use Fourier series approach to try models with
seasonality=672
as discussed in Long seasonal periods?
You need to do something, because as mentioned in that article, the arima function in R does not support seasonal periods greater than 350. I've had reasonable success with the Fourier approach. Other options include forecasting after seasonal decomposition (also covered in FPP), and exponential smoothing models such as bats and tbats.
- If so would this be like
fit<-Arima(timeseries,order=c(0,1,4), xreg=fourier(1:n,4,672)
(where the function fourier is as defined in Hyndman's blog post)
That looks correct. You should experiment with different numbers of terms. Note that there is now a fourier
function in the forecast package with a slightly different specification that I assume supersedes the one on Hyndman's blog. See the help file for syntax.
- Are initial P and Q components included with the fourier series?
I'm not sure what you're asking here. P and Q usually refer to the degrees of the AR and MA seasonal components. Using the fourier approach, there are no seasonal components and instead there are covariates for fourier terms related to season. It's no longer seasonal ARIMA, it's ARIMAX where the covariates approximate the season.
Best Answer
You might review How to model timeseries with unequally-spaced seasonality interval noting that the data that was analyzed there was only for 365 days thus no holiday effects or weekly effects or monthly effects etc. were available to be identified. The problem you are facing is not amenable by either of your two approaches as what you need is a hybrid model that not only incorporates known events including possible price/promotion effects but memory as needed (ARIMA structure) while dealing with anomalies and possible level shifts and trends. One needs to deal with this data in a holistic manner as piecemeal approaches as you were trying will be (normally/always) insufficient.
The arima attempt that you were trying only uses memory while the fixed days attempt is deterministic only in form. One needs to optimally combine both kinds of components while incorporating possible user suggested "causal/predictor/exogenous/helping/supporting" series.