I'm fitting an arima
(1,0,0) model using the forecast
package in R on the usconsumption
dataset. However, when I mimic the same fit using lm
, I get different coefficients. My understanding is that they should be the same (in fact, they give the same coefficients if I model an arima
(0,0,0) and lm
with only the external regressor, which is related to this post: Regression with ARIMA(0,0,0) errors different from linear regression).
Is this because arima
and lm
use different techniques to calculate coefficients? If so, can someone explain the difference?
Below is my code.
> library(forecast)
> library(fpp)
>
> #load data
> data("usconsumption")
>
> #create equivalent data frame from time-series
> lagpad <- function(x, k=1) {
+ c(rep(NA, k), x)[1 : length(x)]
+ }
> usconsumpdf <- as.data.frame(usconsumption)
> usconsumpdf$consumptionLag1 <- lagpad(usconsumpdf$consumption)
>
> #create arima model
> arima(usconsumption[,1], xreg=usconsumption[,2], order=c(1,0,0))
Call:
arima(x = usconsumption[, 1], order = c(1, 0, 0), xreg = usconsumption[, 2])
Coefficients:
ar1 intercept usconsumption[, 2]
0.2139 0.5867 0.2292
s.e. 0.0928 0.0755 0.0605
sigma^2 estimated as 0.3776: log likelihood = -152.87, aic = 313.74
>
> #create lm model
> lm(consumption~consumptionLag1+income, data=usconsumpdf)
Call:
lm(formula = consumption ~ consumptionLag1 + income, data = usconsumpdf)
Coefficients:
(Intercept) consumptionLag1 income
0.3779 0.2456 0.2614
Best Answer
Elaborating a little on @Richard's answer:
The model $$ \begin{aligned} y_t &= \gamma_0 + \gamma_2 x_t + u_t, \\ u_t &= \varphi_1 u_{t-1} + v_t. \end{aligned} $$ can be rearranged from noting that $u_{t} = y_t-\gamma_0 - \gamma_2 x_{t}$ so that $u_{t-1} = y_{t-1}-\gamma_0 - \gamma_2 x_{t-1}$. Plugging this into the equation for $u_t$ gives $$ y_t-\gamma_0 - \gamma_2 x_{t}=\varphi_1(y_{t-1}-\gamma_0 - \gamma_2 x_{t-1})+v_t $$ or $$ y_t =\gamma_0(1-\varphi_1)+\gamma_2 x_{t}+\varphi_1y_{t-1} - \varphi_1\gamma_2 x_{t-1}+v_t $$ Hence, the second model has a nonlinear coefficient restriction on the coefficient of the constant and, more importantly, also implicitly includes $x_{t-1}$.
Let us generate some data from the second model and estimate the restricted arima as well as unrestricted OLS model including $x_{t-1}$:
The results are
and
We see that, for example, the coefficient on $x_{t-1}$, $-0.9771$ is close to the one implied by the restricted arima model $-0.4880\cdot2.0135=-0.983$. Also, as predicted, the coefficients on $x_t$ are close in both models, and so are those on $u_{t-1}$ and $y_{t-1}$. (I suspect remaining differences to be due to different underlying algorithms in
arima
andlm
.)