Solved – ARIMA continuous forecasting

arimaforecasting

I am learning about the ARIMA model and trying to implement it and had some questions. From my understanding, ARIMA forecasts better for short-term projections rather than long-term projections (is this true or false)? I am implementing the ARIMA to make a 6 month projection on a daily basis. Would I get better results if I predict one day, then retrain the model including that day, then predicting the next day, and then retraining, and so on?

EDIT: Thanks for the responses. Sorry if I wasn't being clear. I am using the ARIMAX model, and I have about 30 exogenous variables. I know what the actual values are for the endogenous variable, even for the 6 month projection period. I want to test how well my model performs on this 6 month period. So what I meant by retraining is that after I make a prediction for the next day using n samples, I would retrain on n+1 samples, where the new sample is using the actual value for the endogenous variable as opposed to the prediction.

Best Answer

ARIMA models are univariate so the only information they use is the history of the series. Essentially, they tell you how the variable will react to previous stochastic variation. When you start forecasting with them you no longer have any 'noise' in them because your model will not add in error terms (and nor should it). So what you'll see is that the forecast will quickly decay to the constant term/trend in your ARIMA model. Re-estimating the model won't help because you're not adding any new information about the underlying process.

Over 180+ forecast periods you'll need to add in more explanatory variables to get anything more than a trend line out of your model. For instance, you might include macroeconomic projections as an input, which would give you some dynamics over a 6 month period.

Update: If you are going for in-sample predictive power then using all the data will give you a better fit, so yes it will improve. The underlying question is what you want to achieve. If you're trying to understand the causative factors explaining the dependent variable then a few extra observations aren't going to overturn your theoretical priors. If you want forecasting power then you want to use as much data as possible to estimate your model. Of course you need to be wary of overfitting, which I guess is why you want to check the estimation against a sub-sampled model. If you're seeing a lot of change in your coefficients as you increase the sample size and they don't stabilise that's indicating your model's mis-specified.