Solved – Forecasting daily data with zeros in Python

forecastingprophetpythontime series

I'm currently testing some forecasts on daily sales quantities. However, out of ~2000 observations I have 16 zeros.

How should I approach this? It's mainly Sundays and holidays that holds zero as value. I want to perform some transformations to the time series that doesn't allow for zeros, why I'm looking for solutions.

Example of data:

             Sales_interior
CalendarDate                
2014-01-02       1066.000000
2014-01-03       1735.000000
2014-01-04       2538.000000
2014-01-05        952.000000
2014-01-06       1417.000000
2014-01-07       2205.000000
2014-01-08       1567.000000
2014-01-09       1464.000000
2014-01-10       1636.000000
2014-01-11       1979.000000
2014-01-12          0.000000
2014-01-13       1085.000000

EDIT: I'm currently planning on using a seasonal ARIMA.

Best Answer

My suggestion is simply to exclude holidays or use a dummy as suggested in the comments. In finance for example, in most cases week-ends are excluded from the time-series. why should we model the sales where we are sure that there can be no sales due to holidays and store closures? The coefficients will be estimated in such a way that they will be constant across all the samples t, so they will be influenced to some extent by the calendar effect, and if you do not adjust for this, that effect will indirectly spill to the coefficient estimates to some extent (i.e. to some extent we could imagine that the true autocorrelation will be underestimated assuming that 0s are several compared to the total number of observations). So why taking into account dates where there cannot be sales for “external, calendar, reasons” not due to the true autocorrelation in the time series? Model this as a calendar effect because it is! So that you can isolate the calendar effect from the conditional mean effect due to time-series autocorrelation and make the estimate of the latter cleaner.

If instead 0 sales are not due to holidays, then my best advice is to leave those 0s, because it is “true information”, or at most treat them as outliers.

Related Question