Solved – How to choose automatically between Auto.ARIMA, ETS and STL in R

arimaexponential-smoothingforecastingrtime series

I'm working on a sales forecasting package which should be easy to use for the end user. Given a time series with historical sales data I would like to automatically select one of the three forecasts: Auto.Arima, ETS and STLF.
The idea is to split historical data into 80% train set and 20% test (holdout) set. Then run Auto.Arima, ETS, STLF and choose the one that has best MAPE on the test set.

Now comes the part that is not entirely clear to me. Once I figured out that e.g. ETS gives me the best result should I now

  1. Retrain ETS on the entire set of historical data and generate
    forecast using this new model? My reservation here is that after I
    run ETS again it may even change the class of the algorithm as well
    as the fit parameters which will render the MAPE I got on the test
    set irrelevant.
  2. Just generate the forecast using the model that was trained on the
    80% train set? My problem with this approach is that we are ignoring
    the last 20% of data which is probably the most important
    information for the forecast.
  3. The third idea is to use the same model fit parameters that we got
    after training the model on the 80% train set. But then use the
    entire set of data for forecasting. This seems like a
    reasonable approach but I cannot figure out how to do it for ETS and
    STL (For Arima we can do it by supplying the original fit as the model
    parameter of the arima function)

Could you please let me know what is the right way to approach this problem?

Best Answer

By using 80% and retaining 20% the dog is being wagged by the tail AND even more importantly you are using a sample of 1 origin rather than multiple origins to determine "best". A good procedure is to determine how long (periods ahead) you want to typically forecast (could be 1 period or k periods) say 3 periods for example . Now take your data set say N historical observations and build a model/assess forecasts say from 6 origins i.e. NOB-3, NOB-6 ,... NOB-18 . Construct the 6 models and project each three periods hence. Compute Mapes based upon the 6 origins. In terms of model selection make sure that you consider hybrid models using both ARIMA (memory) and deterministic structure like level shifts, time trend, seasonal pulses and pulses. These models are call Robust Transfer Functions or Dynamic Regression with Intervention Detection.