I'm trying to forecast using ARIMAX with two exogenous (input) variables. I'm using PROC ARIMA, but I can't figure out from the SAS documentation whether my code is producing the parameterization I want.
I want to extend an ARI(12,1) model so that it also includes the last 12 terms of each of the two exogenous variables in my forecast. So, using VariableX
with the two exogenous variables VariableY
and VariableZ
, my best attempt at the code is:
proc arima;
identify var=VariableY(1) nlag=24;
estimate p=12;
identify var=VariableZ(1) nlag=24;
estimate p=12;
identify var=VariableX(1) nlag=24 crosscorr=( VariableY(1) VariableZ(1) );
estimate p=12 input=( VariableY VariableZ );
forecast id=MonthNumber interval=month alpha=.05 lead=24;
run;
quit;
The documentation leads me to believe the first four lines of the procedure are required for setting up the forecast at the end. But when I run the procedure, the output appears to show a forecast using only the last term of each of the two exogenous variables.
In summary, I'd like to be sure where each of the following are controlled:
- The $p$ of $AR(p)$, and similarly for each of the exogenous variables
- The $d$ of $I(d)$, and similarly for each of the exogenous variables
- The $q$ of $MA(q)$, and similarly for each of the exogenous variables
Best Answer
Specifying the Input Variables' ARIMA Models
The ARIMA Procedure uses the results of the first pair(s) of
identify
andestimate
statements (i.e., theidentify
andestimate
statements for the input variables) to create models to forecast the values of the input variable(s) (also called exogenous variable(s)) after the last point in time that each of those input variables are observed. In other words, those statements specify the models that are used whenever values for the input variables are needed for periods not yet observed.Thus, the model for
VariableY
is specified aswhere
VariableY
is modeled as $ARIMA(p,d,q)$ with $p$ =OrderOfAutoregression
, $d$ = the order of differencing (determined fromPeriodsOfDifferencing
), and $q$ =OrderOfMovingAverage
.Specifying Differencing for the Main and Input Series in the ARIMAX Model
The order(s) of differencing to apply to the input variables are specified in the
crosscorr
option; for modelingVariableX
with inputsVariableY
andVariableZ
, the SAS code is:where
DifferencingX
,DifferencingY
, andDifferencingZ
are the period(s) of differencing forVariableX
,VariableY
, andVariableZ
, respectively.Specifying the Order of Autoregression and the Order of Moving Average for the Main and Input Series in the ARIMAX Model
The number of input variable lags to include in the model is specified in the transfer function (in the
input
option). The beginning of theestimate
line sets the orders of autoregression and moving average for the main series (i.e., the series for which a model or forecasts are ultimately being sought):where
VariableX
is modeled as $ARIMAX(p,d,q,b)$ with $p$ =AutoregressionX
and $q$ =MovingAverageX
.The
input
option in the sameestimate
statement sets the orders of autoregression and moving average for the ARIMAX model. The numerator factors for a transfer function for an input series are like the MA part of the ARMA model for the noise series. The denominator factors for a transfer function for an input series are like the AR part of the ARMA model for the noise series. (All examples below will simplify the example down to a single input seriesVariableY
instead of showing bothVariableY
andVariableZ
.)When specified without any numerator or denominator terms, the input variable is treated as a pure regression term (i.e., the value of the input variable in the current period is used without any lags, whether it is forecast by the input variable's ARIMA model or already present as an observed value in the input series):
estimate
...input=( VariableY );
.Numerator terms are represented in parentheses before the input variable.
estimate
...input=( (1 2 3) VariableY );
produces a regression onVariableY
,LAG(VariableY)
,LAG2(VariableY)
, andLAG3(VariableY)
.Denominator terms are represented in parenetheses after a slash and before the input variable.
estimate
...input=( \ (1) VariableY );
estimates the effect ofVariableY
as an infinite distributed lag model with exponentially declining weights.Initial shift is represented before a dollar sign;
estimate
...input=( k $ (
$\omega$-lags) / (
$\delta$-lags) VariableY );
represents the form $B^k \cdot \left(\frac{\omega (B)}{\delta (B)}\right) \cdot \text{VariableY}_t$. The value ofk
will be added to the exponent of $B$ for all numerator and denominator terms. To use an AR-like shift in the input variable without including the un-shifted (i.e., un-lagged or pure regression) term, use this operator instead of numerator terms in parentheses. For example, to set a 6, 12, and 18 month shift in the input seriesVariableY
without the un-shifted term, the statement would beestimate
...input=( 6 $ (6 12) VariableY );
(this results in shifts of 6, 6 + 6 (i.e., 12), and 6 + 12 (i.e., 18)).Summary
The first pair(s) of
identify
andestimate
statements are used to prepare any necessary forecasted values for the input variable(s).The last pair of
identify
andestimate
statements run the actual ARIMAX model, and use forecasted values for the input variable(s) (generated from the first pair(s) ofidentify
andestimate
statements) when necessary.The relationship between the main variable and the input variable(s) is specified in the
crosscorr
option of theidentify
statement and theinput
option of theestimate
statement. The relationship between the main variable and the input variable(s) can be defined as a run-of-the-mill regression relationship; or it can be defined with differencing, AR term(s), and/or MA term(s).Attribution
Although this answer is my own, I was able to come up with the answer based on substantial help (and some quotations) from the official SAS documentation ("The ARIMA Procedure: Rational Transfer Functions and Distributed Lag Models", "The ARIMA Procedure: Specifying Inputs and Transfer Functions", "The ARIMA Procedure: Input Variables and Regression with ARMA Errors", and "The ARIMA Procedure: Differencing"), and from direction found in this answer and comments by IrishStat.