I am trying to forecast sales for a retail store. The given data has daily sales information and also a dummy flag variable to indicate whether the store was open or not on that day. The sales is 0 when the store is closed. The range of dates is from 2013-01-01 to 2015-07-31 (2 years and 7 months). I have been asked to forecast sales for the next 48 days
store_530_ts <- ts(store_530[,-1], frequency = 7)
summary(store_530$Sales)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 3024 4338 4457 5681 12476
autoplot(store_530_ts[,1])
The resulting plot is :
The acf plot shows that there is both weekly and daily seasonality (Am I right about this one?)
acf2(store_530_ts[,1])
My Question is what is the best way to capture both the multiple seasonality and the holiday information in a single model?
I tried fitting a regression model with ARIMA errors with "Open" as a explanatory variable. I used lambda=0 to log transform the sales.
But I am getting the below error:
fit<-auto.arima(store_530_ts[, "Sales"], xreg = store_530_ts[, "Open"],
lambda=0)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y
I understand that the error is because of the records with 0 sales. But, It does not feel right to simply ignore the 0 sales records from analysis.
Can someone tell me which is the best way to go about analyzing such time-series and what is that I am doing wrong. I am completely new to time-series analyis and any help would be much appreciated!
Best Answer
You have weekly and annual seasonality, but not daily (as you only see one observation per day).
I would use set the 0s to
NA
s, and then use a dynamic harmonic regression with Fourier terms for the weekly and annual seasonalities. (See https://otexts.org/fpp2/complexseasonality.html#dynamic-harmonic-regression-with-multiple-seasonal-periods for the details). You need to set seasonal periods toc(7,365)
. The forecasts can be adjusted to 0 on future days where the store is closed.It looks like you have some high sales days, possibly associated with promotions. Perhaps use a dummy variable to model them.