Solved – Time series regression with lagged dependent and independent variables

arimaautoregressiveforecastingtime series

I have monthly data for air passengers, oil price and unemployment. I'm trying to create a model to forecast air travel demand using oil price and unemployment as explanatory variables but are facing some problems:
– Passengers data is measured as percentage change between two months year t, minus the percentage change in the same months the previous year. I've made this model as a seasonal arima (0,1,1)*(0,1,1) which is known as the airline model. How do I compute a model where lags of the oil price and unemployment rate are included to predict the percentage change in passengers? I want it to be a model that makes the forecasts using both the time series forecast and whatever explanatory power the lagged oil price might have. Do you have any ideas on models, similar studies, methods etc I'd be forever thankful.

Best Answer

First, you should decide on using a univariate or a multivariate model. It seems reasonable to think that oil price and unemployment are causal for the air travel demand and not the other way around. Thus, in line with one of the answers to this post, you may address your study in a univariate setting. If the previous assumption is not appropriate, then you may take a multivariate approach, for example a VAR model, as mentioned by @Miha Trošt.

In the univariate setting, you can consider the following models:

  • ARIMAX models: these are ARIMA models as the model that you selected which include exogenous regressors.
  • Distributed lag models: these models are based on a regression equation that includes lagged versions of the explanatory variables.
  • Autoregressive distributed lag models, as the previous model but including also as regressors the lags of the dependent variable.

Did you check whether the regular and seasonal differencing filters applied by the airlines model are necessary? You mention that the series is measured in rates, this may already render the series stationary. This is not something that must necessarily meet, I didn't see the data, so this is just a guess.

You should be also concerned with the correlation among the regressors. Oil price and unemployment may be correlated. If correlation exists and is high, estimates of the parameters may not be accurate. If correlation is high, you may include only one of the regressors. There are some techniques to deal with multicollinearity but with only two regressors it is probably not worth complicating too much the analysis and it will probably be safe to keep both variables, unless they are highly correlated.

Related Question