Solved – What will be frequency for daily observation in time series and how to deal with 0 values

forecastingmultiple-seasonalitiesr

I have to forecast sales for stores. So for that I am using ARIMA model.Here first we need to create times series object using ts function which takes frequency parameter.As far as I know we use 1=annual, 4=quarterly, 12=monthly but don't know sure what will be frequency for daily observations. I tried using 1,7,365 and number of observation as values for frequency parameter but with these I am not able to get proper plots and forecast.My second question is how to deal with 0 values for specific observation as they are producing errors as follows:

Error in na.fail.default(as.ts(x)) : missing values in object for acf() and pacf() 

and

Error in OCSBtest(x, m) : The OCSB regression model cannot be estimatedauto.arima() functions.

Here is the data:
https://drive.google.com/file/d/0B-KJYBgmb044QlNUS3FhVFhUbE0/view?usp=sharing

Below is my code:

 data<-read.csv("Book5.csv")
   View(data)

   mydata<- ts(data[,2], start=1, end=181, frequency = 7)
   View(mydata)
   plot(mydata, xlab="Day", ylab = "Sales")

   plot(diff(mydata),xlab="Day",ylab="Differenced Sales")
   plot(log10(mydata),ylab="Log(Sales)")
   plot(diff(log10(mydata)),ylab="Differenced Log (Sales)")

   par(mfrow = c(1,2))
   acf(ts(diff(log10(mydata))),main="ACF Sales")
   pacf(ts(diff(log10(mydata))),main="PACF Sales")

   require(forecast)
   ARIMAfit <- auto.arima(log10(mydata), approximation=FALSE,trace=FALSE)
   summary(ARIMAfit)

   pred <- predict(ARIMAfit, n.ahead= 31)
   pred
   class(pred$pred)
       10^(pred$pred)

   # Write CSV in R
   write.csv(10^(pred$pred), file = "MyData.csv")

   plot(mydata,type="l",xlim=c(1,52),ylim=c(1,6000),xlab = "Day",ylab =   "Sales")
   lines(10^(pred$pred),col="blue")
       lines(10^(pred$pred+2*pred$se),col="orange")
       lines(10^(pred$pred-2*pred$se),col="orange")

Best Answer

ARIMA models are not very well suited for forecasting daily store sales.

  1. You have multiple seasonalities. The intra-weekly seasonality is usually strongest, so you could in principle work with frequency=7 and hope for the best. However, there often is also yearly seasonality (frequency=365), or biweekly/monthly seasonality (frequency=14 or frequency=365/12 - not sure whether this even works) driven by paychecks. Standard ARIMA implementation can't deal with more than one seasonality. And long seasonal cycles are problematic for ARIMA.

  2. You have store closures with zero sales on Sundays, which you very sensibly set to NA. However, ARIMA can't deal with NAs. You could in principle pretend to have a six-day week, remove all NAs and use frequency=6 - unless you have additional store closure days apart from Sundays (e.g., Christmas). ARIMA will run into problems here.

  3. You will often have additional effects, like Black Friday, Christmas, Chinese New Year and so forth. You could in principle model these using a regression with ARIMA errors, using Boolean or ramp-up dummies.

Your best bet might be to run a regression with day-of-week dummies and harmonics or hump functions to model yearly seasonality, as well as ramp-ups or similar to model any special holiday effects, possibly try ARIMA errors using auto.arima - but fit the seasonality using the dummies, not the ARIMA part.

You may want to look through some of our earlier questions on daily time series.