Solved – Regression with ARIMA(0,0,0) errors different from linear regression

arimaforecastingrregressiontime series

A Regression with ARIMA errors is given by the following formula (saw on Hyndman et al, 1998):

$Y_t = b_0 + b_1 X_{1,t} + \dots + b_k X_{k,t} + N_t$

where $N_t$ is modeled as an ARIMA process.

If we have that the model for $N_t$ is ARIMA$(0,0,0)$, then $N_t = e_t$, and $Y_t$ is modeled by an ordinary regression.

Suppose the following data:

a <- structure(c(29305, 9900, 9802, 17743, 49300, 17700, 24100, 11000, 
10625, 23644, 38011, 16404, 14900, 16300, 18700, 11814, 13934, 
12124, 18097, 30026, 3600, 15700, 12300, 14600), .Tsp = c(2010.25, 
2012.16666666667, 12), class = "ts")
b <- structure(c(1.108528016, 1.136920872, 1.100239002, 1.057191265, 
1.044200511, 1.102063834, 1.083847756, 1.068585841, 1.084879628, 
1.232979511, 1.168894672, 1.257302058, 1.264967051, 1.234793782, 
1.306452369, 1.252644047, 1.178593218, 1.124432965, 1.132878661, 
1.189926986, 1.17249669, 1.176285957, 1.176552, 1.179178082), .Tsp = 
c(2010.25, 2012.16666666667, 12), class = "ts")

If I model it using auto.arima function, I have:

auto.arima(a, xreg=b)
Series: a 
ARIMA(0,0,0) with zero mean     

Coefficients:
              b
      15639.266
s.e.   1773.186

sigma^2 estimated as 101878176:  log likelihood=-255.33
AIC=514.65   AICc=515.22   BIC=517.01

lm(a~b)

Call:
lm(formula = a ~ b)

Coefficients:
(Intercept)            b  
      48638       -26143  

Coefficients from the models differ. Shouldn't they be the same? What am I missing?

Best Answer

As pointed out in the comments, the difference between the models is that auto.arima() has not included an intercept. It selects a model, possibly including the constant, using the AICc. With one covariate, the model is $$y_t = \beta_0 x_t + n_t$$ where $n_t$ is an ARIMA process. Note that the intercept is shifted to the ARIMA process. In this example, the selected model for $n_t$ does not include a constant.

If you know what model you want, why use auto.arima()? Instead, you could use

arima(a,xreg=b)

which gives

Series: a 
ARIMA(0,0,0) with non-zero mean 

Coefficients:
      intercept          b
       48638.40  -26143.23
s.e.   32410.27   27893.41

sigma^2 estimated as 93138232:  log likelihood=-254.25
AIC=514.5   AICc=515.7   BIC=518.03

This is the same as the model obtained using lm(a~b). The estimates are identical, but the standard errors are different because they are estimated in a different way (numerically from the hessian matrix rather than using the inverse of $(X'X)$.)

Related Question