Solved – Forecasting daily sales by handling multiple seasonality and zero sales in R

arimaforecastingrtime series

I am trying to forecast sales for a retail store. The given data has daily sales information and also a dummy flag variable to indicate whether the store was open or not on that day. The sales is 0 when the store is closed. The range of dates is from 2013-01-01 to 2015-07-31 (2 years and 7 months). I have been asked to forecast sales for the next 48 days

store_530_ts <- ts(store_530[,-1], frequency = 7)
summary(store_530$Sales)

Min. 1st Qu. Median Mean 3rd Qu. Max.

0 3024 4338 4457 5681 12476

autoplot(store_530_ts[,1])

The resulting plot is :

enter image description here

The acf plot shows that there is both weekly and daily seasonality (Am I right about this one?)

acf2(store_530_ts[,1])

enter image description here

My Question is what is the best way to capture both the multiple seasonality and the holiday information in a single model?
I tried fitting a regression model with ARIMA errors with "Open" as a explanatory variable. I used lambda=0 to log transform the sales.
But I am getting the below error:

fit<-auto.arima(store_530_ts[, "Sales"], xreg = store_530_ts[, "Open"], 
lambda=0)


Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
NA/NaN/Inf in 'y

I understand that the error is because of the records with 0 sales. But, It does not feel right to simply ignore the 0 sales records from analysis.

Can someone tell me which is the best way to go about analyzing such time-series and what is that I am doing wrong. I am completely new to time-series analyis and any help would be much appreciated!

Best Answer

You have weekly and annual seasonality, but not daily (as you only see one observation per day).

I would use set the 0s to NAs, and then use a dynamic harmonic regression with Fourier terms for the weekly and annual seasonalities. (See https://otexts.org/fpp2/complexseasonality.html#dynamic-harmonic-regression-with-multiple-seasonal-periods for the details). You need to set seasonal periods to c(7,365). The forecasts can be adjusted to 0 on future days where the store is closed.

It looks like you have some high sales days, possibly associated with promotions. Perhaps use a dummy variable to model them.

Related Question