Solved – SAS ARIMA: estimate without identify? or auto.arima in SAS

arimasastime series

I have several groups of data for time series analysis, and I need some automatic way to do the analysis. I try to find a way in SAS to automatically do the job, but I can't find one corresponding to auto.arima() in R.

I have two questions regarding this:

The closest one is to use identify with scan and esacf to find tentative orders for arima model. However, that doesn't test unitroot or whether the data is just a whitenoise. Also, I still need to manually figure out the orders. Does anyone know if there is any procedure in SAS that can do the analysis automatically?
I tried to write a macro to do the job automatically. I need to fist run identify to check unitroot, whitenoise, orders etc. and based on the results, I need to run estimate accordingly with corresponding orders. However, when I write two separate proc arima procedures, SAS requires me to run identify again in the second procedure, otherwise the estimate can't be executed. For example:
```
PROC ARIMA data = xxx;
     IDENTIFY VAR = yy scan esacf;
RUN;
/* other code */

PROC ARIMA data = xxx;
     ESTIMATE p=1 q=1; <---------------- ERROR: must run IDENTIFY first
RUN;
```

Is there a way to use the results of the identify in the first ARIMA for the estimate in the second ARIMA procedure? Otherwise, it's just a waste of time to run the identify twice.

Best Answer

You are fortunate to ask this question onn this site because IrishStat has been automating ARIMA models for over 30 years (sorry to give away your age Dave). Also Rob Hyndman wrote the auto.arima procedure in R. I have a connection as I took my first time series course in a short course by Box and Tiao at Carnegie - Mellon University in 1974 (giving away my age now). Also when I was the Chief of Statistical Research at Risk Data Corporation (in the early 1990s) I hired Terry Woodfield who authored the ETS software at the SAS Institute just before we were able to draw him away. I am sure PROC ARIMA has gone through many changes but i am sure that if you make contact with Terry he could probably help you.

Personally the way I learned it from Box, Tiao and Pack ARIMA modeling is an iterative process that should be gone through manually in stages with the user making decisions at various stages. That is not to say that good results cannot be obtained by automated procedures. In fact I think that Dave Reilly (IrishStat) along with his son Tom have so much experience doing this that they will contend that they could produce a better model with their algorithm than I can do manually and they may be right. But my point is that for a time series specialist to take that approach takes away some of the steps that help him really get to understand the characteristics of thee seris very well.

One thing that always troubled me in the early years was that the Box-Jenkins methodology was revered a little too much. Estimation is by conditional least squares and so the normality of the residuals is important and often overlooked (a buried secret). In the late 1970s i work on the problem of outliers in time series and Darryl Downing and I published a paper on the topic in JASA in 1982.

Since then other like Doug Martin, George Tiao and Ruey Tsay have made much bigger contributions. IrishStat is aware of that literature and has incorporated their ideas in his software. That is why he emphasizes checking for level shifts and outliers before fixating on an ARIMA model. That aspect of his software makes it somewhat unique. It is different from auto.arima and SAS/ETS. So keep that in mind in your search for other automated procedures using SAS.

I hope you appreciate this as an answer even though it does not directly answer questions 1 or 2. I am sure you can find Terry Woodfield on the internet or go directly to the SAS Institute with your questions which are very specific to SAS and really require someone with intimate knowledge of the SAS algorithms. I don't think you will find anyone on this site who could give you better help.

Specifying the Input Variables' ARIMA Models

The ARIMA Procedure uses the results of the first pair(s) of identify and estimate statements (i.e., the identify and estimate statements for the input variables) to create models to forecast the values of the input variable(s) (also called exogenous variable(s)) after the last point in time that each of those input variables are observed. In other words, those statements specify the models that are used whenever values for the input variables are needed for periods not yet observed.

Thus, the model for VariableY is specified as

identify var=VariableY(PeriodsOfDifferencing);
estimate p=OrderOfAutoregression q=OrderOfMovingAverage;

where VariableY is modeled as $ARIMA(p,d,q)$ with $p$ = OrderOfAutoregression, $d$ = the order of differencing (determined from PeriodsOfDifferencing), and $q$ = OrderOfMovingAverage.

Specifying Differencing for the Main and Input Series in the ARIMAX Model

The order(s) of differencing to apply to the input variables are specified in the crosscorr option; for modeling VariableX with inputs VariableY and VariableZ, the SAS code is:

identify var=VariableX(DifferencingX) crosscorr=( VariableY(DifferencingY) VariableZ(DifferencingZ) );

where DifferencingX, DifferencingY, and DifferencingZ are the period(s) of differencing for VariableX, VariableY, and VariableZ, respectively.

Specifying the Order of Autoregression and the Order of Moving Average for the Main and Input Series in the ARIMAX Model

The number of input variable lags to include in the model is specified in the transfer function (in the input option). The beginning of the estimate line sets the orders of autoregression and moving average for the main series (i.e., the series for which a model or forecasts are ultimately being sought):

estimate p=AutoregressionX q=MovingAverageX

where VariableX is modeled as $ARIMAX(p,d,q,b)$ with $p$ = AutoregressionX and $q$ = MovingAverageX.

The input option in the same estimate statement sets the orders of autoregression and moving average for the ARIMAX model. The numerator factors for a transfer function for an input series are like the MA part of the ARMA model for the noise series. The denominator factors for a transfer function for an input series are like the AR part of the ARMA model for the noise series. (All examples below will simplify the example down to a single input series VariableY instead of showing both VariableY and VariableZ.)

When specified without any numerator or denominator terms, the input variable is treated as a pure regression term (i.e., the value of the input variable in the current period is used without any lags, whether it is forecast by the input variable's ARIMA model or already present as an observed value in the input series): estimate...input=( VariableY );.

Numerator terms are represented in parentheses before the input variable. estimate...input=( (1 2 3) VariableY ); produces a regression on VariableY, LAG(VariableY), LAG2(VariableY), and LAG3(VariableY).

Denominator terms are represented in parenetheses after a slash and before the input variable. estimate...input=( \ (1) VariableY ); estimates the effect of VariableY as an infinite distributed lag model with exponentially declining weights.

Initial shift is represented before a dollar sign; estimate...input=( k $ ( $\omega$-lags ) / ( $\delta$-lags ) VariableY ); represents the form $B^k \cdot \left(\frac{\omega (B)}{\delta (B)}\right) \cdot \text{VariableY}_t$. The value of k will be added to the exponent of $B$ for all numerator and denominator terms. To use an AR-like shift in the input variable without including the un-shifted (i.e., un-lagged or pure regression) term, use this operator instead of numerator terms in parentheses. For example, to set a 6, 12, and 18 month shift in the input series VariableY without the un-shifted term, the statement would be estimate...input=( 6 $ (6 12) VariableY ); (this results in shifts of 6, 6 + 6 (i.e., 12), and 6 + 12 (i.e., 18)).

Summary

The first pair(s) of identify and estimate statements are used to prepare any necessary forecasted values for the input variable(s).

The last pair of identify and estimate statements run the actual ARIMAX model, and use forecasted values for the input variable(s) (generated from the first pair(s) of identify and estimate statements) when necessary.

The relationship between the main variable and the input variable(s) is specified in the crosscorr option of the identify statement and the input option of the estimate statement. The relationship between the main variable and the input variable(s) can be defined as a run-of-the-mill regression relationship; or it can be defined with differencing, AR term(s), and/or MA term(s).

Attribution

^{Although this answer is my own, I was able to come up with the answer based on substantial help (and some quotations) from the official SAS documentation ("The ARIMA Procedure: Rational Transfer Functions and Distributed Lag Models", "The ARIMA Procedure: Specifying Inputs and Transfer Functions", "The ARIMA Procedure: Input Variables and Regression with ARMA Errors", and "The ARIMA Procedure: Differencing"), and from direction found in this answer and comments by IrishStat.}

Time-Series – Estimating the Same Model Over Multiple Time Series: A Comprehensive Guide

You could do a grid search: start with ARIMA(1,0,0) and try all the possibilities up to ARIMA(5,2,5) or something. Fit the model to each series, and estimate a scale-independent error measurement like MAPE or MASE (MASE would probably be better). Choose the ARIMA model with the lowest average MASE across all your models.

You could improve this procedure by cross-validating your error measurement for each series, and also by comparing your results to a naive forecast.

It might be a good idea to ask why you're looking for a single model to describe all of the series. Unless they're generated by the same process, this doesn't seem like a good idea.

Best Answer

Related Solutions

Solved – How to ensure PROC ARIMA is performing the correct parameterization of input variables

Specifying the Input Variables' ARIMA Models

Specifying Differencing for the Main and Input Series in the ARIMAX Model

Specifying the Order of Autoregression and the Order of Moving Average for the Main and Input Series in the ARIMAX Model

Summary

Time-Series – Estimating the Same Model Over Multiple Time Series: A Comprehensive Guide

Related Question