Solved – Endogenous extra regressors in ARIMA, transfer function and/or ARIMAX

arimaendogeneity

Following the post here, it has been an intense couple of weeks trying to understand ARIMA and how to apply exogenous regressors to the model. To summarize,
I have attempted to forecast monthly unemployment data (in percentage) during several years using ARIMA and using viewership data of some Wikipedia articles as my exogenous regressors. Both, the time series and the regressors, have the same length. In many occasions, the addition of exogenous regressors improves the prediction of unemployment (in this case, 5 months of unemployment) obtained only using ARIMA without regressors. We have tried to test the robustness of this model by shifting back in time one month at a time. Taking care to keep at least 3 years of training and always forecasting 5 months. We noticed that the accuracy changes considerably.

We now have thought that perhaps the use of regressors is not appropriate because the "viewership time series" may not be completely independent. So we have considered using ARIMAX and transfer functions. The idea is to use both unemployment and viewership data to forecast unemployment. It is in this part that I am confused …
Do you know any example of how to implement transfer functions using ARIMAX in R?
Do you think this is the right approach or should I stick with ARIMA and exogenous regressors?

Best Answer

Neither (univariate) ARIMAX nor (univariate) regression with ARMA errors will remedy the problem of endogeneity. These models assume the exogenous variable is, hmm, exogenous.

A simple extension of ARIMAX to systems of more than one endogenous variable is vector autoregression (VAR). A more complicated one is vector ARMA (VARMA). Both can also include exogenous regressors, turning them into VARX and VARMAX, respectively. VAR and VARX will likely suffice for starters (personally, I find VARMA and VARMAX quite tedious and computationally tricky).