Solved – Difference between different autoregressive models

autocorrelationautoregressiveforecastingstatatime series

I am trying to understand the difference between these three different specifications of an autoregressive model for variable var in Stata:

reg L(0/3).var
arima var L(1/3).var
arima var, arima(3,0,0)

All thee provide similar (to an extent), answers. The first two have equal coefficients, but different standard errors. The last has similar coefficients.

However, when using them for forecasting. The last model provides wildly different results.
For example, given this data:

v  
31.75  
32.31  
38.25  
42  
45.7  
45.3  
45  
45.24  
46  
45  
44.38  
44.21  
44  
43  
43  

and time t

gen t = _n
tsset t

I would like to forecast the value of var in future periods. I would like to find a predicted value as well as the lower and upper bounds: therefore I also solve for standard errors. In this example, I am looking forward 12 periods:

tsappend, add(12)
reg L(0/3).var
forecast create OLS, replace
estimates store OLS
forecast estimates OLS
forecast solve, simulate(errors, statistic(stddev,prefix(olssd_)) reps(1000))

I have done this for all three models. The last model provides almost no variation in standard errors as the period increases.

So, what are the differences between these models? And which one should be used when it is believed that a series is autocorrelated?

Best Answer

The specification of the three models is essentially the same. What changes is the interfaces and how you are defining the arguments passed to them. Differences will be due to different default options used by each interface (e.g., optimization algorithm, covariance matrix of parameter estimates,...).

The model is the same in the three cases and can be written as follows ($y_t$ denotes the time series var):

$$ y_t = \beta_0 + \beta_1 y_{t-1} + \beta_2 y_{t-2} + \beta_3 y_{t-3} + \epsilon_t \,,\quad t=1,\dots,n \,, \quad \epsilon_t \sim NID(0, \sigma^2) \,. $$

The third case is the most natural for your example, however, the details explained below may suggest the possibility of other alternatives (e.g. ARIMAX model).


In the first case, you are using the interface for a linear regression model, where the explanatory variables are the lags of the dependent variable. This model is probably fitted by default by OLS (you may check the documentation).


In the second case, you are fitting an ARIMA model with external regressors (sometimes called ARIMAX model). However, your definition is somewhat weird because you are defining as external regressors the lags of the dependent variable (so they cannot be considered external or exogenous since they are generated by the same model as the dependent variable). In addition, no ARMA structure is defined for the error term, which is precisely the point of ARIMAX models. An example of a meaningful ARIMAX model would be, for example:

arima var x, ar(1) ma(1)

which fits the following model:

$$ y_t = \beta_0 + \beta_1 x_t + \mu_t \,, \quad \mu_t = \phi \mu_{t-1} + \theta \epsilon_{t-1} + \epsilon_t \,, \quad \epsilon_t \sim NID(0, \sigma^2) \,. $$


The third case is the most natural in terms of the usage of the interface: you should still check that it is a suitable model for the data, looking at the residuals, etc. Here you are defining an ARIMA(3,0,0) model for the time series var. The command arima is suited precisely to fit this model.