Solved – ARIMA and linear regression

arimaforecastingrregression

I have a linear regression model that is used to forecast the 'afluent natural energy' (ANE) of some region.

The predictors for this model are:

  • the previous month ANE (ANE0)
  • the previous month rain volume (PREC0)
  • the current month forecast for rain volume (PREC1)

We have 7 years of historical data for all of these variables, for each month. The current model just runs a OLS linear regression. I feel there's a lot of improvements to be done, but i'm not a time series specialist.

The first thing I notice is that the predictors are highly correlated (multicollinearity).
I'm not certain of the impacts of multicollinearity on prediction confidence.

I decided to try a time series approach, so I ran a ACF and PACF on the historic data:
The ACF shows a sine wave pattern, and the PACF has a spike at 1 and 2. So I tried both ARIMA (2, 0, 0) and ARIMA(2,0,1) to predict 20 periods ahead.

The ARIMA(2,0,1) shows good results, but I'm not certain as to how to compare it to the linear regression model.

What's the best way to test the performance of these model? I'm using R as analysis tool (together with the forecast package).

Best Answer

The best way to test out of sample prediction is to do a pseudo out of sample forecasting experiment. Use about 75 percent of the data to train the models, make the prediction, record the forecast error, update the information set, and repeat. At the end, you can use all of the forecast errors to get the mean-squared forecast error (MSFE). The model with the lowest MSFE is the best predictor. In general the out-of-sample (OOS) r-squared can help you compare a model to some benchmark.

$$\text{OOS } R^2 = 1 - \frac{MSFE_{model}}{MSFE_{benchmark}}$$

If the OOS r-squared is greater than zero, the model beats the benchmark. A typical choice for the benchmark is the historical mean. The higher the OOS r-squared, the better.

A couple of other thoughts: If you're not sure which model to choose, just average the forecasts from both models. Average forecasts usually perform better than individual forecasts. Also, since your time series looks like a sine wave you may want to check for seasonality and use seasonal time series models.