Solved – Forecasting a time series without significant autocorrelation

forecastingregressiontime series

I have a time series without significant autocorrelation.

http://pastebin.com/xZJnZs7k

Autocorrelation plot.
Autocorrelation plot

Partial autocorrelation plot.
Partial autocorrelation plot

Here you can find a year worth of data and a forecast produced by auto.arima from the forecast package:
observed and predicted values

enter image description here

I have tried multiple models, for example ANNs and SVR, but they seem to always predict the most frequent values of the series. In other words the forecasts are nearly straight lines near the series mean value.
Also I tried to use the ets R framework, but the forecasts were not much different from
All models were validated using rolling out of sample forecasts.

No model achieved an R2 score over 0.1. I aim to minimize the mean absolute error, the best I got was 1.
My question is: What methods are there for forecasting such time series?

Best Answer

Your data is discrete and as such ARIMA modelling can be useful but tentative at best. See this post What are the consequences of not meeting the assumptions for the residuals of ARIMA model? which discusses discrete data analysis with a slightly different twist.

I took your daily data (365 values) and used AUTOBOX to detect structure. There is no identifiable autoregressive memory structure (ARIMA) . There is no significant day of the week effect in the observations but curiously outliers are not equally distributed across days suggesting a possible special cause. There is however two months of the year that appear to be statistically significant although with only 1 year of data that could be proxying something else. No particular day of the month appeared to be significant. There are a number of days that appear to be "unusual" i.e. outliers. You might examine those dates in order to find a possible assignable cause as part of you discovery process.

Here is the Actual data enter image description here . The equation enter image description here and the Actual , Fit and Forecast graph is here enter image description here . Notice that while the expected value is flat for the next 30 days the upper confidence limits (obtained through monte-carlo methods by resampling the decidely non-normal residuals) are not. Standard Box-jenkins confidence limits for forecasts assume perfect kdnowlede of the estimated parameters and normality of the errors providing possibly naive symmetric limits with the proviso that unusal values may have occurred in the past but they won't reoccur. Lower limits are clearly 0.0 by specification and are shown here for generality purposes.

It turns out that while you have a significant # of outliers they are not evenly distributed across days. Since the confidence limits include the very possibility of future shocks (anomalies) they are "jagged" . Here is the forecast plot by itself .enter image description here