Solved – Daily forecasting using ARIMA in R

arimaforecastingrtime series

I am new to time series modeling in R. I have sales data of one year and three months only. I am trying to do sales forecasting at the day level or max at the week level. Following is the step I intend to follow

Convert it into time series object using ts(data$qty, frequency= ??). Here I am very confused about frequency. I can see in data that there is some seasonality like sales is picking up in May, June, July and then again in festival seasons. I guess I cannot use 365 as I have only one year data. Please suggest what should be the frequency
Decompose the time series. Subtract the seasonality and trend from the actual time series model
Fit ARIMA to get a prediction
Again add seasonality and trend to output the final forecast

Please provide feedback on this if its correct approach or not or if there is any other better way to handle it.

Best Answer

No need to subtract any seasonality.

I would recommend using a regression model with NO ARIMA component. Bring in 6 day of the week dummy variables, 11 month of the year variables, holiday variables while searching for outliers, level shifts, trend. Removing variables ("stepdown") that are not necessary and bringing ("stepup") in dummies for the outliers I listed.

We recommend using 3 years of data in order for holidays to exist on weekends and weekdays so that you can get an overall read on the lead and lag relationships. You can also search for day of the month dummy variables, but this is typically only necessary for CASH demand type problems. You can find the need to specify a week of the month dummy for special cases as well.

You can build a pivot table in Excel to compare the coefficients in the model to the % of the total for the day of the week and month of the year to get a "poor man's model" to confirm that the coefficients make sense.

Related Solutions

Solved – Time Series Forecasting with Daily Data: ARIMA with regressor

You should be evaluating models and forecasts from different origins across different horizons and not one one number in order to gauge an approach.

I assume that your data is from the US. I prefer 3+ years of daily data as you can have two holidays landing on a weekend and get no weekday read. It looks like your Thanksgiving impact is a day off in the 2012 or there was a recording error of some kind and caused the model to miss the Thanksgiving day effect.

Januarys are typically low in the dataset if you look as a % of the year. Weekends are high. The dummies reflect this behavior....MONTH_EFF01, FIXED_EFF_N10507,FIXED_EFF_N10607

I have found that using an AR component with daily data assumes that the last two weeks day of the week pattern is how the pattern is in general which is a big assumption. We started with 11 monthly dummies and 6 daily dummies. Some dropped out of the model. B**1 means that there is a lag impact the day after a holiday. There were 6 special days of the month (days 2,3,5,21,29,30----21 might be spurious?) and 3 time trends, 2 seasonal pulses (where a day of the week started deviating from the typical, a 0 before this data and a 1 every 7th day after) and 2 outliers (note the thanksgiving!) This took just under 7 minutes to run. Download all results here www.autobox.com/se/dd/daily.zip

It includes a quick and dirty XLS sheet to check to see if the model makes sense. Of course, the XLS % are in fact bad as they are crude benchmarks.

Try estimating this model:

Y(T) =  .53169E+06                                                                                        
       +[X1(T)][(+  .13482E+06B** 1)]                                       M_HALLOWEEN
       +[X2(T)][(+  .17378E+06B**-3)]                                       M_JULY4TH
       +[X3(T)][(-  .11556E+06)]                                            M_MEMORIALDAY
       +[X4(T)][(-  .16706E+06B**-4+  .13960E+06B**-3-  .15636E+06B**-2                                                 
       -  .19886E+06B**-1)]                                                 M_NEWYEARS
       +[X5(T)][(+  .17023E+06B**-2-  .26854E+06B**-1-  .14257E+06B** 1)]   M_THANKSGIVI
       +[X6(T)][(-  71726.    )]                                            MONTH_EFF01
       +[X7(T)][(+  55617.    )]                                            MONTH_EFF02
       +[X8(T)][(+  27827.    )]                                            MONTH_EFF03
       +[X9(T)][(-  37945.    )]                                            MONTH_EFF09
       +[X10(T)[(-  23652.    )]                                            MONTH_EFF10
       +[X11(T)[(-  33488.    )]                                            MONTH_EFF11
       +[X12(T)[(+  39389.    )]                                            FIXED_EFF_N10107
       +[X13(T)[(+  63399.    )]                                            FIXED_EFF_N10207
       +[X14(T)[(+  .13727E+06)]                                            FIXED_EFF_N10307
       +[X15(T)[(+  .25144E+06)]                                            FIXED_EFF_N10407
       +[X16(T)[(+  .32004E+06)]                                            FIXED_EFF_N10507
       +[X17(T)[(+  .29156E+06)]                                            FIXED_EFF_N10607
       +[X18(T)[(+  74960.    )]                                            FIXED_DAY02
       +[X19(T)[(+  39299.    )]                                            FIXED_DAY03
       +[X20(T)[(+  27660.    )]                                            FIXED_DAY05
       +[X21(T)[(-  33451.    )]                                            FIXED_DAY21
       +[X22(T)[(+  43602.    )]                                            FIXED_DAY29
       +[X23(T)[(+  68016.    )]                                            FIXED_DAY30
       +[X24(T)[(+  226.98    )]                                            :TIME TREND        1                   1/  1   1/ 3/2011   I~T00001__010311stack
       +[X25(T)[(-  133.25    )]                                            :TIME TREND      423                  61/  3   2/29/2012   I~T00423__010311stack
       +[X26(T)[(+  164.56    )]                                            :TIME TREND      631                  91/  1   9/24/2012   I~T00631__010311stack
       +[X27(T)[(-  .42528E+06)]                                            :SEASONAL PULSE  733                 105/  5   1/ 4/2013   I~S00733__010311stack
       +[X28(T)[(-  .33108E+06)]                                            :SEASONAL PULSE  370                  53/  6   1/ 7/2012   I~S00370__010311stack
       +[X29(T)[(-  .82083E+06)]                                            :PULSE           326                  47/  4  11/24/2011   I~P00326__010311stack
       +[X30(T)[(+  .17502E+06)]                                            :PULSE           394                  57/  2   1/31/2012   I~P00394__010311stack
      +                    +   [A(T)]

Solved – Daily forecasting

In addition to what has been said, you might want to consider structural time series models. They account explicitly for one or more seasonalities and trend, and are very tolerant of missing data. A good starting point might be the R function StructTS(). More complex models can be fit with packages such as dlm, KFAS, and several others.

Best Answer

Related Solutions

Solved – Time Series Forecasting with Daily Data: ARIMA with regressor

Solved – Daily forecasting

Related Question