Solved – Time Series Forecasting – Daily data

arimaforecastingpythontime series

I'm relatively new to time series forecasting. I've been assigned with the task of forecasting operation time of an industrial equipment based on a daily data (3 years of daily data).
The prediction is desired for at least 6 months in future .I've investigated time series forecasting domain for the past few weeks to come up with possible models for my forecasting problem. After reading out several related questions in this helpful community, I have tried my hands with auto-arima package of python.
What have I tried so far?:

  1. Aggregated the daily data into a weekly sum
  2. Understood the seasonal decomposition of the data using statsmodel library and there is a clear seasonality in data
  3. Split the data into train and test set
  4. Fitted an auto-arima seasonal model on train set and generated an out of sample forecast for the length of test period. Predictors such as holiday week, week of the year have been given as input.
  5. Compared the actual test data and arima forecasted data with MAE metric.

The MAE of raw weekly summed data is higher than that of rolling window averaged weekly summed (window=8) input train data. Here is the result of my model forecast on rolling averaged data:

Fit ARIMA: order=(2, 0, 2) seasonal_order=(1, 1, 0, 52); AIC=558.923, BIC=585.271, Fit time=44.283 seconds

enter image description here

I have a question with regards to model development and testing of time series forecasting:

  1. Here is how my raw data look:
    Day vs Duration of equipment in hours Is it a common practice to apply rolling mean on the raw data before fitting an arima-seasonal model? (I understand that some valuable information will be lost by averaging. But what if I can trade off some valuable information for a reasonable model?). Fitting on averaged data resulted in a better out of sample forecast compared to fitting on a raw data. I am unable to find information on this practice with my limited internet search on this topic.

Any reference to documentation for fitting on a noisy data is appreciated. I am ready to invest more time to understand the time series modelling thoroughly. I know I have barely scratched the surface of time series modelling but what puzzles me the most is how good the forecast is for weekly summed rolling window averaged (window =8) out of sample forecast.

I shall email the data if necessary.

Best Answer

I took your 981 daily values enter image description here and used AUTOBOX ( a piece of forecasting software that I have helped to develop) . The original data visually suggests level shifts ( up at period 560 down at period 801 ) which was confirmed here in a useful model also containing German holiday effects enter image description here AND monthly effects enter image description here enter image description here and here enter image description here

The Actual/Fit and Forecast graph is here enter image description here

The model residual plot enter image description here and ACF plot suggest model sufficiency enter image description here

The forecast plot is here emphasizing the monthly effects and the holiday effects along with the level shift effects

enter image description here

Hope this helps you and others dealing with the need to develop daily forecasts.

There is no need for any ARIMA structure . . Your arima model has self-cancelling structure (2,0,2) and there is no need for seasonal differencing once you incorporate the 4 fixed monthly effects.

EDITED AFTER RECEIPT OF OP'S QUESTIONS/COMMENTS

For example a model of the form (1-.5b)z(t)=(1-.5b)a(t) has self cancelling structure . auto.arima is simple trial& error where unwarranted ar structure or unwarranted differencing often generates unwarranted ma structure and unwarranted complexity with consequently wider prediction limits due to over-parameterization.

GIVEN that you KNOW how many level shifts occurred and when ;… GIVEN that you KNOW that 4 and only 4 specified months of the year are important; GIVEN that you KNOW which holidays and what days around the holidays are important and ; GIVEN that you KNOW what time period/points are Outliers ( one-time anomalies) you can certainly use auto.arima without penalty on the residuals from all of the these effects OR even better examine the acf/pacf of the residuals and self-determine the form of the arima structure.

In summary adjust for all the KNOWN effects and examine the residuals ( i.e. the adjusted Y's) to investigate what arima structure is necessary THEN re-estimate with all of the structure and test significance of each and every coefficient ... stepping down appropriately.