R – Is STL a Good Technique for Forecasting Instead of ARIMA?

arimaforecastingmultiple-seasonalitiesrtime series

I have a long time series(data at hourly level, for 6 years). The data is showing an hourly, a weekly, a monthly as well as a yearly trend. For this data, should I try stl(Seasonal and Trend decomposition using Loess) or arima?

I am using R for analysis and have used arima in the past. But I am not sure which technique, to use in this case, considering the data has multiple level of seasonality.

My second question is more generic. Do people, working in time series forecasting, actually use stl for forecasting purpose? I, am of the opinion, that stl is a good technique for understanding the data, and arima is a better technique for forecasting. Is this understanding correct?

Best Answer

The Forecasting: principles and practice book by Rob J. Hyndman and George Anthanasopoulos answers your question:

STL has several advantages over the classical decomposition method and X-12-ARIMA: Unlike X-12-ARIMA, STL will handle any type of seasonality, not only monthly and quarterly data. The seasonal component is allowed to change over time, and the rate of change can be controlled by the user. The smoothness of the trend-cycle can also be controlled by the user. It can be robust to outliers (i.e., the user can specify a robust decomposition). So occasional unusual observations will not affect the estimates of the trend-cycle and seasonal components. They will, however, affect the remainder component. On the other hand, STL has some disadvantages. In particular, it does not automatically handle trading day or calendar variation, and it only provides facilities for additive decompositions.

So STL can deal with phenomena such as multiple seasonalities, high-frequency seasonalities (e.g. 365 for daily data) and cycles. However if you have daily data you can go for a model which tackles the thematic of multiple seasonalities, e.g. TBATS instead of STL. If you have many multiple seasonalities and millions or billions of observations you can go for data-savvy complex models such as recurrent neural nets. STL might be a useful approach for modeling business cycles. In a business cycle not every cycle has exact the same length, but they are rather an irregularly recurring phenomenon. Sometimes recession might last 2 years and sometimes it might last 5 or 6. STL is more able to capture this kind of uncertainty than ARIMA. Also if you cannot make your data stationary STL will be more useful than ARIMA. ARIMA requires the data to be stationary or at least to be stationary in differences.

You can also take a part of your data as test set and test whether on your particular dataset STL works better than ARIMA.

STL is usually used for understanding and describing the data, but you can combine it with simple forecasting methods as described in chapter 6.6 of the book I mentioned above. These simply forecasting methods are not always "worse" than complex methods such as Auto-regressive models, neural nets or Bayesian models such as Kalman filter. You can compare them for instance by using scaled errors (e.g. MAE and RMSE) on the training and the test set of your data.

Related Solutions

Solved – Time Series Forecasting with Daily Data: ARIMA with regressor

You should be evaluating models and forecasts from different origins across different horizons and not one one number in order to gauge an approach.

I assume that your data is from the US. I prefer 3+ years of daily data as you can have two holidays landing on a weekend and get no weekday read. It looks like your Thanksgiving impact is a day off in the 2012 or there was a recording error of some kind and caused the model to miss the Thanksgiving day effect.

Januarys are typically low in the dataset if you look as a % of the year. Weekends are high. The dummies reflect this behavior....MONTH_EFF01, FIXED_EFF_N10507,FIXED_EFF_N10607

I have found that using an AR component with daily data assumes that the last two weeks day of the week pattern is how the pattern is in general which is a big assumption. We started with 11 monthly dummies and 6 daily dummies. Some dropped out of the model. B**1 means that there is a lag impact the day after a holiday. There were 6 special days of the month (days 2,3,5,21,29,30----21 might be spurious?) and 3 time trends, 2 seasonal pulses (where a day of the week started deviating from the typical, a 0 before this data and a 1 every 7th day after) and 2 outliers (note the thanksgiving!) This took just under 7 minutes to run. Download all results here www.autobox.com/se/dd/daily.zip

It includes a quick and dirty XLS sheet to check to see if the model makes sense. Of course, the XLS % are in fact bad as they are crude benchmarks.

Try estimating this model:

Y(T) =  .53169E+06                                                                                        
       +[X1(T)][(+  .13482E+06B** 1)]                                       M_HALLOWEEN
       +[X2(T)][(+  .17378E+06B**-3)]                                       M_JULY4TH
       +[X3(T)][(-  .11556E+06)]                                            M_MEMORIALDAY
       +[X4(T)][(-  .16706E+06B**-4+  .13960E+06B**-3-  .15636E+06B**-2                                                 
       -  .19886E+06B**-1)]                                                 M_NEWYEARS
       +[X5(T)][(+  .17023E+06B**-2-  .26854E+06B**-1-  .14257E+06B** 1)]   M_THANKSGIVI
       +[X6(T)][(-  71726.    )]                                            MONTH_EFF01
       +[X7(T)][(+  55617.    )]                                            MONTH_EFF02
       +[X8(T)][(+  27827.    )]                                            MONTH_EFF03
       +[X9(T)][(-  37945.    )]                                            MONTH_EFF09
       +[X10(T)[(-  23652.    )]                                            MONTH_EFF10
       +[X11(T)[(-  33488.    )]                                            MONTH_EFF11
       +[X12(T)[(+  39389.    )]                                            FIXED_EFF_N10107
       +[X13(T)[(+  63399.    )]                                            FIXED_EFF_N10207
       +[X14(T)[(+  .13727E+06)]                                            FIXED_EFF_N10307
       +[X15(T)[(+  .25144E+06)]                                            FIXED_EFF_N10407
       +[X16(T)[(+  .32004E+06)]                                            FIXED_EFF_N10507
       +[X17(T)[(+  .29156E+06)]                                            FIXED_EFF_N10607
       +[X18(T)[(+  74960.    )]                                            FIXED_DAY02
       +[X19(T)[(+  39299.    )]                                            FIXED_DAY03
       +[X20(T)[(+  27660.    )]                                            FIXED_DAY05
       +[X21(T)[(-  33451.    )]                                            FIXED_DAY21
       +[X22(T)[(+  43602.    )]                                            FIXED_DAY29
       +[X23(T)[(+  68016.    )]                                            FIXED_DAY30
       +[X24(T)[(+  226.98    )]                                            :TIME TREND        1                   1/  1   1/ 3/2011   I~T00001__010311stack
       +[X25(T)[(-  133.25    )]                                            :TIME TREND      423                  61/  3   2/29/2012   I~T00423__010311stack
       +[X26(T)[(+  164.56    )]                                            :TIME TREND      631                  91/  1   9/24/2012   I~T00631__010311stack
       +[X27(T)[(-  .42528E+06)]                                            :SEASONAL PULSE  733                 105/  5   1/ 4/2013   I~S00733__010311stack
       +[X28(T)[(-  .33108E+06)]                                            :SEASONAL PULSE  370                  53/  6   1/ 7/2012   I~S00370__010311stack
       +[X29(T)[(-  .82083E+06)]                                            :PULSE           326                  47/  4  11/24/2011   I~P00326__010311stack
       +[X30(T)[(+  .17502E+06)]                                            :PULSE           394                  57/  2   1/31/2012   I~P00394__010311stack
      +                    +   [A(T)]

Solved – Should I use a seasonal arima or stl decomposition and model residuals only

Outliers

Outliers should be easily detected by plotting a box-plot. "In order to be an outlier, the data value must be larger than Q3 by at least 1.5 times the interquartile range (IQR), or. smaller than Q1 by at least 1.5 times the IQR". For a more detailed way of detecting outliers please refer to: https://stackoverflow.com/questions/24750819/outlier-detection-of-time-series-data-in-r

Anomalies

To detect anomalies check this RPubs, it seems quite simple to perform: https://www.rpubs.com/vmez/409672

STL vs seasonal adjustment of arima

From what I know, which is not a lot, the differentiating(d) term of Sarima simply the difference between consecutive observations is computed, where it accounts for the trend. The D component or seasonal differentiating is the difference between an observation and the previous observation from the same season(for monthly data it is y_t-y_t-12). These differentiation techniques are relatively simple comparing to the mathematical computations behind stl(). This thread here will answer your question better: Is stl a good technique for forecasting, instead of Arima? To sum it up: "STL can deal with phenomena such as multiple seasonalities, high-frequency seasonalities better than arima", so it basically depends on your data.

Best Answer

Related Solutions

Solved – Time Series Forecasting with Daily Data: ARIMA with regressor

Solved – Should I use a seasonal arima or stl decomposition and model residuals only

Related Question