Solved – ARIMA forecast has all values as zero

arimasas

I'm trying to forecast 15 data points based on a time series of 61 data points. Each point is the daily total for a measure, and values of zero are possible. I do have the actual values for the 15 points I'm trying to forecast, so the model can be validated with this info. The data and my code are at the end.

There seems to be a weekly seasonality to the data (which makes sense in real-life, unfortunately I cannot disclose what the measure is about). I tried to fit an ARIMA(0,1,1)*(0,1,1)$_{7}$ model to the log of the data. However, the exponentiated forecast returns zero (or very close to) values for all 15 days – see plot at the end for comparison between actual and forecast values.

What am I missing / doing wrong ? I'm fairly new to ARIMA / timeseries forecasting, but I did try to read as much as possible and in theory this model would be a good starting point.

Here is my data and code:

data train; 
    infile cards;
    input date mmddyy10.  x;
    format date : date10.;
    datalines;
9/1/2016    241
9/2/2016    233
9/3/2016    197
9/4/2016    214
9/5/2016    0
9/6/2016    88
9/7/2016    446
9/8/2016    719
9/9/2016    118
9/10/2016   55
9/11/2016   198
9/12/2016   114
9/13/2016   300
9/14/2016   129
9/15/2016   58
9/16/2016   95
9/17/2016   159
9/18/2016   222
9/19/2016   141
9/20/2016   213
9/21/2016   109
9/22/2016   136
9/23/2016   41
9/24/2016   104
9/25/2016   276
9/26/2016   76
9/27/2016   0
9/28/2016   34
9/29/2016   0
9/30/2016   110
10/1/2016   136
10/2/2016   0
10/3/2016   45
10/4/2016   33
10/5/2016   712
10/6/2016   130
10/7/2016   139
10/8/2016   88
10/9/2016   39
10/10/2016  66
10/11/2016  32
10/12/2016  0
10/13/2016  240
10/14/2016  105
10/15/2016  174
10/16/2016  91
10/17/2016  10
10/18/2016  158
10/19/2016  55
10/20/2016  0
10/21/2016  133
10/22/2016  534
10/23/2016  274
10/24/2016  129
10/25/2016  49
10/26/2016  0
10/27/2016  18
10/28/2016  316
10/29/2016  0
10/30/2016  193
10/31/2016  0
;

data test; 
    infile cards;
    input date mmddyy10.  x;
    format date : date10.;
    datalines;
11/1/2016   36
11/2/2016   161
11/3/2016   211
11/4/2016   128
11/5/2016   232
11/6/2016   244
11/7/2016   65
11/8/2016   110
11/9/2016   35
11/10/2016  315
11/11/2016  193
11/12/2016  31
11/13/2016  83
11/14/2016  114
11/15/2016  103
;

proc timeseries data=train plot=(series periodogram);
   var x;
   id date interval=day;
   spectra freq period p / adjmean bart c=1.5 expon=0.2 ;
run;


data train;
    set train;
    xlog = log(x+0.0000001);
run;

 proc arima data=LUCRU.train;  
   identify  var = xlog(1,7);  
   estimate q=(1)(7) method=ml; 
   forecast id=date interval=day lead=15 printall out=fcast;
run; 

data fcast_exp;
    set fcast;
    where date >= '01nov2016'd;
    ForecastValue = exp(FORECAST);
run;

proc sql noprint;
    create table TestResults as 
    Select t1.*, t2.Actual from 
    (Select Date, ForecastValue from fcast_exp) t1
        INNER join
    (Select Date, x as Actual from test) t2
        ON t1.Date = t2.Date;
quit;

proc sgplot data=TestResults nocycleattrs;
  series x=Date y=Actual / lineattrs=(color=blue);
  series x=Date y=ForecastValue / lineattrs=(color=red);
  run;

Best Answer

You might want to look at Transforming data as it is very relevant to your problem and also https://stats.stackexchange.com/questions/249005/what-are-the-assumptions-for-the-residuals-of-arima-model/249106#249106. The fundamental problem is your identified model is flawed for a number of possible reasons as it incorrectly converts statistical symptoms to incorrect (in this case) statistical cures. The data is nonstationary (that is a symptom) , the cause is a shift in the mean (correct cure) at period 21 which is visually obvious from here enter image description here whereas your model decided to unfortunately apply regular differencing (wrong cure) . There is no need for logs or any other power transform when you adjust for the clear anomalies. Box-Cox transform determination easily misreads untreated positive outliers (high values) as causing high variance (symptom) whereas once they are adjusted (correct cure) no evidence is found suggesting the need for a power transform.

Following is the ACF of the original series showing no significant seasonal structure whereas your model had a seasonal difference. Intervention Detection procedures suggested a day 5 effect which might have been the reason for the unwarranted seasonal differencing in your model.

The plot of the Actual/Fit and Forecast from the model suggested by AUTOBOX (which I have helped to develop) visually tells an interesting story.enter image description here . Since values can never go below zero for your data simply truncate the lower confidence interval estimate to 0.0.

The ACF of the original series is here enter image description here and ACF of the model's residuals are here enter image description here with residual plot here enter image description here.

The equation is here enter image description here notice that the level shift variable was not the dominant player as it was obfuscated by the anomalies (just as my eyes were) . The details of the model are here in 3 parts .enter image description here enter image description here enter image description here . Finally we show the plot of the forecasts .enter image description here

It is interesting to see the actual and the cleansed data together as it shows what might have been enter image description here

Hope this helps you and others to better understand time series methodology and practice.