Solved – Forecasting daily time series with many zeros

crostons-methodforecastingseasonalitytime serieszero inflation

I need to forecast a univariate time-series of sales data with the following characterica.

  • It is a daily time-series
  • Around 70-80 % of the date nothing is sold ($x_t = 0$)
  • At the 20-30 % remaining days there is a positive integer numberof sales
  • The days during which nothing is sold are not always at the sameay day of the week

Until now I tried the croston-method (croston() from the forecast package in R).

Is the croston-method appropriate?
Are there any suitable alternatives?

I am also grateful for code in R.

Edit:

My data looks similar to the data below:

0,0,1,0,0,0,0,2,0,0,0,0,0,0, 0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0

Best Answer

(This answer is based on experience with the business side of sales forecasting, more so than on rigorous statistical/mathematical knowledge)

Looking at your data, it makes more sense to forecast it at a weekly level than at a daily level. At at daily level it is too sparse, but at a weekly level you would have a more meaningful times series.

week 1: 0,0,1,0,0,0,0

week 2: 2,0,0,0,0,0,0

week 3: 0,0,0,1,0,0,0

week 4: 1,0,1,0,0,0,0

week 5: 0,0,0,0,0,0,0

week 6: 1,0,0,2,0,0,0

Any forecasting method you would use at a daily level, would give a fractional value per day. This doesn't really help, since these are sales units, so a forecast value of ~ 0.14 doesn't mean much, unless you interpret it as a probability (and I don't know enough math to help in that case, but others might know better how to treat that).

If you aggregate the data by week, you get:

week 1: 1

week 2: 2

week 3: 1

week 4: 2

week 5: 0

week 6: 3

You can then simply average that value over all the weeks you have, or maybe use a moving average. You would then get an average of 3 units sold per two weeks.

Keep in mind that this is a sales forecast: What is the purpose of a sales forecast? To make sure that you have enough inventory to satisfy customers' demand. Based on the method I described above, you would know that you need to ship/order 3 units of inventory every 2 weeks to satisfy the demand for that product - without going into ARIMA or Exponential smoothing or some other more involved time series analysis.