Solved – How to interpret regression coefficients with autocorrelated residuals

arimarregressiontime series

I am building a regression model of time series data in R, where my primary interest is the coefficients of the independent variables. The data exhibit strong seasonality with a trend.

Original data

The model looks good, with four of the six regressors significant:
Model

Here are the OLS residuals:
Residuals

I used auto.arima to select the sARIMA structure, and it returns the model (0,1,1)(1,1,0)[12].

fit.ar <- auto.arima(at.ts, xreg = xreg1, stepwise=FALSE, approximation=FALSE)
summary(fit.ar)

Series: at.ts 
ARIMA(0,1,1)(1,1,0)[12]                    

Coefficients:
          ma1    sar1      v1       v2      v3       v4         v5
      -0.7058  0.3974  0.0342  -0.0160  0.0349  -0.0042  -113.4196
s.e.   0.1298  0.2043  0.0239   0.0567  0.0555   0.0333   117.1205

sigma^2 estimated as 3.86e+10:  log likelihood=-458.13
AIC=932.26   AICc=936.05   BIC=947.06

Training set error measures:
                   ME     RMSE      MAE       MPE     MAPE      MASE
Training set 7906.896 147920.3 103060.4 0.1590107 3.048322 0.1150526

My question is this: based on the parameter estimates and s.e. of the regressors, I believe that none of them are significant – is this correct, and if so, what does it imply if my goal is to interpret the relative importance of these predictors as opposed to forecasting?

Any other advice relative to the process of building this model is welcome and appreciated.

Here are the ACF and PACF for the residuals:

ACF-PACF

> durbinWatsonTest(mod.ols, max.lag=12)
 lag Autocorrelation D-W Statistic p-value
   1     0.120522674     1.6705144   0.106
   2     0.212723044     1.4816530   0.024
   3     0.159828108     1.5814771   0.114
   4     0.031083831     1.8352377   0.744
   5     0.081081308     1.6787808   0.418
   6    -0.024202465     1.8587561   0.954
   7    -0.008399949     1.7720761   0.944
   8     0.040751905     1.6022835   0.512
   9     0.129788310     1.4214391   0.178
  10    -0.015442379     1.6611922   0.822
  11     0.004506292     1.6133994   0.770
  12     0.376037337     0.7191359   0.000
 Alternative hypothesis: rho[lag] != 0

Best Answer

Just to be certain, the residual plot does not demonstrate autocorrelation. If you did want to visually describe the extent of autocorrelation in these data, I think a variogram would be a much better descriptive tool. Nonetheless, autocorrelation is present in these data based on your understanding of the time series and your belief that the predictors in your model do not adequately handle the extent of residual variance due to autocorrelation that they could. An example of when that might not be the case is in clinical trials with adaptive dosing where the dosage is based on severity of disease, so you make the time series of subsequent maladies conditionally independent by adjusting for the dosage.

Nonetheless, at any point if there is unmeasured correlation in the data, the interpretation of the coefficients remains exactly the same. The least squares regression slope is still a measure of expected difference in outcomes comparing a unit difference in exposures/regressors. What you lose by ignoring correlation is efficiency/validity of inference. Your standard errors can be inflated or shrunk, so it's the confidence intervals that are messed up. Of course, if you accurately identify the correlation structure, you can iteratively estimate correlation and parameters, and obtain the BLUE which gives valid inference. This is all detailed in Seber & Lee.

What's not in Seber & Lee is that if you use robust standard errors, these will give you correct inference even if the correlation structure is misspecified. The data could be AR-1, you can specify independence, and you still get valid inference... you just lose a little efficiency relative to estimating robust standard errors with the correct correlation structure.