Solved – Difference between different autoregressive models

autocorrelationautoregressiveforecastingstatatime series

I am trying to understand the difference between these three different specifications of an autoregressive model for variable var in Stata:

reg L(0/3).var
arima var L(1/3).var
arima var, arima(3,0,0)

All thee provide similar (to an extent), answers. The first two have equal coefficients, but different standard errors. The last has similar coefficients.

However, when using them for forecasting. The last model provides wildly different results.
For example, given this data:

and time t

gen t = _n
tsset t

I would like to forecast the value of var in future periods. I would like to find a predicted value as well as the lower and upper bounds: therefore I also solve for standard errors. In this example, I am looking forward 12 periods:

tsappend, add(12)
reg L(0/3).var
forecast create OLS, replace
estimates store OLS
forecast estimates OLS
forecast solve, simulate(errors, statistic(stddev,prefix(olssd_)) reps(1000))

I have done this for all three models. The last model provides almost no variation in standard errors as the period increases.

So, what are the differences between these models? And which one should be used when it is believed that a series is autocorrelated?

Best Answer

The specification of the three models is essentially the same. What changes is the interfaces and how you are defining the arguments passed to them. Differences will be due to different default options used by each interface (e.g., optimization algorithm, covariance matrix of parameter estimates,...).

The model is the same in the three cases and can be written as follows ($y_t$ denotes the time series var):

$$ y_t = \beta_0 + \beta_1 y_{t-1} + \beta_2 y_{t-2} + \beta_3 y_{t-3} + \epsilon_t \,,\quad t=1,\dots,n \,, \quad \epsilon_t \sim NID(0, \sigma^2) \,. $$

The third case is the most natural for your example, however, the details explained below may suggest the possibility of other alternatives (e.g. ARIMAX model).

In the first case, you are using the interface for a linear regression model, where the explanatory variables are the lags of the dependent variable. This model is probably fitted by default by OLS (you may check the documentation).

In the second case, you are fitting an ARIMA model with external regressors (sometimes called ARIMAX model). However, your definition is somewhat weird because you are defining as external regressors the lags of the dependent variable (so they cannot be considered external or exogenous since they are generated by the same model as the dependent variable). In addition, no ARMA structure is defined for the error term, which is precisely the point of ARIMAX models. An example of a meaningful ARIMAX model would be, for example:

arima var x, ar(1) ma(1)

which fits the following model:

$$ y_t = \beta_0 + \beta_1 x_t + \mu_t \,, \quad \mu_t = \phi \mu_{t-1} + \theta \epsilon_{t-1} + \epsilon_t \,, \quad \epsilon_t \sim NID(0, \sigma^2) \,. $$

The third case is the most natural in terms of the usage of the interface: you should still check that it is a suitable model for the data, looking at the residuals, etc. Here you are defining an ARIMA(3,0,0) model for the time series var. The command arima is suited precisely to fit this model.

Related Solutions

Solved – Use ARIMA equation outside R

You don’t need to write a forecast function. It has been already written in the “forecast” package. First save the fitted object, then use forecast function. Here is an example:

> library(forecast)
> fit <- Arima(WWWusage,order=c(3,1,0))
> forecast(fit,h=20)
   Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
101       219.6608 215.7393 223.5823 213.66339 225.6582
102       219.2299 209.9265 228.5332 205.00164 233.4581
103       218.2766 203.8380 232.7151 196.19471 240.3585
104       217.3484 198.3212 236.3756 188.24885 246.4479
105       216.7633 193.2807 240.2458 180.84976 252.6768
106       216.3785 188.3324 244.4246 173.48575 259.2713
107       216.0062 183.3651 248.6473 166.08598 265.9264
108       215.6326 178.5027 252.7624 158.84738 272.4178
109       215.3175 173.8431 256.7919 151.88792 278.7471
110       215.0749 169.3780 260.7719 145.18743 284.9625
111       214.8767 165.0662 264.6873 138.69805 291.0554
112       214.7015 160.8934 268.5097 132.40907 296.9940
113       214.5483 156.8653 272.2312 126.32982 302.7667
114       214.4201 152.9828 275.8573 120.45992 308.3803
115       214.3142 149.2367 279.3917 114.78670 313.8417
116       214.2248 145.6158 282.8337 109.29641 319.1531
117       214.1482 142.1126 286.1837 103.97933 324.3171
118       214.0831 138.7215 289.4446  98.82752 329.3386
119       214.0282 135.4363 292.6202  93.83221 334.2243
120       213.9821 132.2502 295.7140  88.98400 338.9802

Regression – Understanding Multicollinearity in OLS

Re your 1st question Collinearity does not make the estimators biased or inconsistent, it just makes them subject to the problems Greene lists (with @whuber 's comments for clarification).

Re your 3rd question: High collinearity can exist with moderate correlations; e.g. if we have 9 iid variables and one that is the sum of the other 9, no pairwise correlation will be high but there is perfect collinearity.

Collinearity is a property of sets of independent variables, not just pairs of them.

Best Answer

Related Solutions

Solved – Use ARIMA equation outside R

Regression – Understanding Multicollinearity in OLS

Related Question