R – Comparison of AR(p) Model Estimation: Differences Between lm and arima Functions in R

arimalinear modelrregressiontime series

I'm in the middle of reading a book Forecasting: Principles and Practice and simultaneously trying to code "by hand" all the things, for better understanding.

I've found something that I cannot explain:

  1. Given a time-series generated by AR(2) process,
  2. I try to model it as linear regression problem
  3. Judging from what I've read so far- it should be perfectly possible to find matching coefficients using both methods.

Unfortunately, I get different results. Question is why?

AR(p) model is described by the equation:

$$
y_t = \epsilon_t + \phi_1y_{t-1} + \phi_2y_{t-2} + … + \phi_py_{t-p}
$$

I've lagged time-series by 1 and 2 and created classical dataframe with three columns:

  • current x
  • t-1 x
  • t-2 x
    and performed regression.

R code example below:

library(xts)
x <- arima.sim(model = list(order=c(2,0,0), ar=c(0.6,0.3)), n=100)
x.ts <- as.xts(x)

# Generate dataframe
x.lag1 <- lag(x.ts, 1)
x.lag2 <- lag(x.ts, 2)
x.df <- data.frame(
  x.curr=x.ts[-c(1:2)],
  x.lag1=x.lag1[-c(1:2)],
  x.lag2=x.lag2[-c(1:2)]

)

# Fit regression
fit.x <- lm(x.curr ~ ., data=x.df)
summary(fit.x)

# Fit ARIMA
arima(x = x.ts, order = c(2,0,0), include.mean = FALSE)

Results are quite different:

 #Regression results
 Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
 (Intercept)  0.08173    0.11102   0.736    0.463    
 x.lag1       0.54860    0.10082   5.441 4.11e-07 ***
 x.lag2       0.15724    0.09883   1.591    0.115    

# Arima results
      ar1     ar2        intercept
      0.5919  0.1645     0.5572
s.e.  0.0983  0.1009     0.4268

Coefficients are similar, although slightly different. Questions:

  • why is it so? Is it this random error component included in every
    observation in AR(p) process?
  • how could I model ARMA process with
    differencing (non-stationary) using regression? Should I fit
    regression model to differenced-and-lagged time series just like
    that?

Best Answer

There are a few reasons. For one, your ARMA model doesn't include a mean/intercept. For another, the ARMA by default uses sum of squares only to find starting points for an iterative maximum likelihood scheme. Least squares regression (which throws away early data points), is usually called conditional sum of squares (CSS) in time series.

These should match up

summary(lm(x.curr ~ ., data=x.df))
arima(x = x.ts, order = c(2,0,0), include.mean = T, method="CSS") # note the mean and method arguments

Well you'll notice that there's a difference between lm's intercept and the arima's mean. The relationship is that the intercept equals the mean times $(1 - \phi_1 - \phi_2)$. You can verify that this works.

Also, and this makes everything much more confusing, the arima function will call its mean the intercept. This is a well-known issue covered in other questions such as this one, and is also explained here.

One more thing: your description for an AR(p) model is only true if you're looking at mean zero AR models. In general you can write it as $$ (1 - \phi_1B - \cdots - \phi_p B^p)(X_t - \mu) = \epsilon_t $$ where $\mu$ is the mean, or $$ (1 - \phi_1B - \cdots - \phi_p B^p)X_t = c + \epsilon_t $$ where $c$ is the intercept. This will help you with the intercept/mean dilemma above.

Finally, regarding your last question:

how could I model ARMA process with differencing (non-stationary) using regression? Should I fit regression model to differenced-and-lagged time series just like that?

You can either difference your nonstationary series, perhaps with diff in R, or by changing the order argument in your call to arima. For example, fitting an AR(3) to differenced data, is the same as an ARIMA(3,1,0), and so would require the parameter c(3,1,0).