R – ARIMA and LM Not Giving Same Coefficients: Understanding Differences

I'm fitting an arima(1,0,0) model using the forecast package in R on the usconsumption dataset. However, when I mimic the same fit using lm, I get different coefficients. My understanding is that they should be the same (in fact, they give the same coefficients if I model an arima(0,0,0) and lm with only the external regressor, which is related to this post: Regression with ARIMA(0,0,0) errors different from linear regression).

Is this because arima and lm use different techniques to calculate coefficients? If so, can someone explain the difference?

Below is my code.

> library(forecast)
> library(fpp)
> 
> #load data
> data("usconsumption")
> 
> #create equivalent data frame from time-series
> lagpad <- function(x, k=1) {
+   c(rep(NA, k), x)[1 : length(x)] 
+ }
> usconsumpdf <- as.data.frame(usconsumption)
> usconsumpdf$consumptionLag1 <- lagpad(usconsumpdf$consumption)
> 
> #create arima model
> arima(usconsumption[,1], xreg=usconsumption[,2], order=c(1,0,0))

Call:
arima(x = usconsumption[, 1], order = c(1, 0, 0), xreg = usconsumption[, 2])

Coefficients:
         ar1  intercept  usconsumption[, 2]
      0.2139     0.5867              0.2292
s.e.  0.0928     0.0755              0.0605

sigma^2 estimated as 0.3776:  log likelihood = -152.87,  aic = 313.74
> 
> #create lm model
> lm(consumption~consumptionLag1+income, data=usconsumpdf)

Call:
lm(formula = consumption ~ consumptionLag1 + income, data = usconsumpdf)

Coefficients:
    (Intercept)  consumptionLag1           income  
         0.3779           0.2456           0.2614

> lm(y[2:n]~x[2:n]+y[1:(n-1)]+x[1:(n-1)]) Call: lm(formula = y[2:n] ~ x[2:n] + y[1:(n - 1)] + x[1:(n - 1)]) Coefficients: (Intercept) x[2:n] y[1:(n - 1)] x[1:(n - 1)] 0.5222 2.0164 0.4880 -0.9771

Best Answer

Elaborating a little on @Richard's answer:

The model $$ \begin{aligned} y_t &= \gamma_0 + \gamma_2 x_t + u_t, \\ u_t &= \varphi_1 u_{t-1} + v_t. \end{aligned} $$ can be rearranged from noting that $u_{t} = y_t-\gamma_0 - \gamma_2 x_{t}$ so that $u_{t-1} = y_{t-1}-\gamma_0 - \gamma_2 x_{t-1}$. Plugging this into the equation for $u_t$ gives $$ y_t-\gamma_0 - \gamma_2 x_{t}=\varphi_1(y_{t-1}-\gamma_0 - \gamma_2 x_{t-1})+v_t $$ or $$ y_t =\gamma_0(1-\varphi_1)+\gamma_2 x_{t}+\varphi_1y_{t-1} - \varphi_1\gamma_2 x_{t-1}+v_t $$ Hence, the second model has a nonlinear coefficient restriction on the coefficient of the constant and, more importantly, also implicitly includes $x_{t-1}$.

Let us generate some data from the second model and estimate the restricted arima as well as unrestricted OLS model including $x_{t-1}$:

n <- 5000
x <- rnorm(n)
u <- arima.sim(list(ar=0.5),n=n)
gamma_0 <- 1
gamma_2 <- 2
y <- gamma_0 + gamma_2*x + u

The results are

> arima(y, xreg=x, order=c(1,0,0))

Call:
arima(x = y, order = c(1, 0, 0), xreg = x)

Coefficients:
         ar1  intercept       x
      0.4880     1.0196  2.0135
s.e.  0.0123     0.0279  0.0128

and

We see that, for example, the coefficient on $x_{t-1}$, $-0.9771$ is close to the one implied by the restricted arima model $-0.4880\cdot2.0135=-0.983$. Also, as predicted, the coefficients on $x_t$ are close in both models, and so are those on $u_{t-1}$ and $y_{t-1}$. (I suspect remaining differences to be due to different underlying algorithms in arima and lm.)

Best Answer

Related Solutions

Solved – remedy for removing autocorrelations from residuals of seasonally fitted ARIMA model

Related Question