I'm in the middle of reading a book Forecasting: Principles and Practice and simultaneously trying to code "by hand" all the things, for better understanding.
I've found something that I cannot explain:
- Given a time-series generated by AR(2) process,
- I try to model it as linear regression problem
- Judging from what I've read so far- it should be perfectly possible to find matching coefficients using both methods.
Unfortunately, I get different results. Question is why?
AR(p) model is described by the equation:
$$
y_t = \epsilon_t + \phi_1y_{t-1} + \phi_2y_{t-2} + … + \phi_py_{t-p}
$$
I've lagged time-series by 1 and 2 and created classical dataframe with three columns:
- current x
- t-1 x
- t-2 x
and performed regression.
R code example below:
library(xts)
x <- arima.sim(model = list(order=c(2,0,0), ar=c(0.6,0.3)), n=100)
x.ts <- as.xts(x)
# Generate dataframe
x.lag1 <- lag(x.ts, 1)
x.lag2 <- lag(x.ts, 2)
x.df <- data.frame(
x.curr=x.ts[-c(1:2)],
x.lag1=x.lag1[-c(1:2)],
x.lag2=x.lag2[-c(1:2)]
)
# Fit regression
fit.x <- lm(x.curr ~ ., data=x.df)
summary(fit.x)
# Fit ARIMA
arima(x = x.ts, order = c(2,0,0), include.mean = FALSE)
Results are quite different:
#Regression results
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.08173 0.11102 0.736 0.463
x.lag1 0.54860 0.10082 5.441 4.11e-07 ***
x.lag2 0.15724 0.09883 1.591 0.115
# Arima results
ar1 ar2 intercept
0.5919 0.1645 0.5572
s.e. 0.0983 0.1009 0.4268
Coefficients are similar, although slightly different. Questions:
- why is it so? Is it this random error component included in every
observation in AR(p) process? - how could I model ARMA process with
differencing (non-stationary) using regression? Should I fit
regression model to differenced-and-lagged time series just like
that?
Best Answer
There are a few reasons. For one, your ARMA model doesn't include a mean/intercept. For another, the ARMA by default uses sum of squares only to find starting points for an iterative maximum likelihood scheme. Least squares regression (which throws away early data points), is usually called conditional sum of squares (CSS) in time series.
These should match up
Well you'll notice that there's a difference between
lm
's intercept and thearima
's mean. The relationship is that the intercept equals the mean times $(1 - \phi_1 - \phi_2)$. You can verify that this works.Also, and this makes everything much more confusing, the
arima
function will call its mean the intercept. This is a well-known issue covered in other questions such as this one, and is also explained here.One more thing: your description for an AR(p) model is only true if you're looking at mean zero AR models. In general you can write it as $$ (1 - \phi_1B - \cdots - \phi_p B^p)(X_t - \mu) = \epsilon_t $$ where $\mu$ is the mean, or $$ (1 - \phi_1B - \cdots - \phi_p B^p)X_t = c + \epsilon_t $$ where $c$ is the intercept. This will help you with the intercept/mean dilemma above.
Finally, regarding your last question:
You can either difference your nonstationary series, perhaps with
diff
in R, or by changing theorder
argument in your call toarima
. For example, fitting an AR(3) to differenced data, is the same as an ARIMA(3,1,0), and so would require the parameterc(3,1,0)
.