Solved – VAR model residuals having significant correlation at lag 12

cross-validationrresidualsvector-autoregression

I have tried to fit a VAR model for two stationary time series dlogsl_ts and dlogllc_ts(tested by PP test and ADF test), the monthly river flow data. From:

VARselect(dlogdata, lag.max=10) # SC(3)

It seems that I could try fitting the model with a lag of 3. Overall my model seems good:

dlogllc_ts = dlogsl_ts.l1 + dlogllc_ts.l1 + dlogsl_ts.l2 + dlogllc_ts.l2 + dlogsl_ts.l3 + 
             dlogllc_ts.l3 

               Estimate Std. Error t value Pr(>|t|)    
dlogsl_ts.l1  -0.21404    0.06660  -3.214 0.001387 ** 
dlogllc_ts.l1  1.27827    0.05459  23.417  < 2e-16 ***
dlogsl_ts.l2   0.32462    0.09590   3.385 0.000762 ***
dlogllc_ts.l2 -0.61543    0.07962  -7.729 5.11e-14 ***
dlogsl_ts.l3  -0.19386    0.06617  -2.930 0.003529 ** 
dlogllc_ts.l3  0.20707    0.05516   3.754 0.000193 ***

Residual standard error: 0.3493 on 555 degrees of freedom
Multiple R-Squared: 0.7601, Adjusted R-squared: 0.7575 
F-statistic: 293.1 on 6 and 555 DF,  p-value: < 2.2e-16 

Covariance matrix of residuals:
           dlogsl_ts dlogllc_ts
dlogsl_ts    0.08150    0.06342
dlogllc_ts   0.06342    0.12197

Correlation matrix of residuals:
           dlogsl_ts dlogllc_ts
dlogsl_ts     1.0000     0.6361
dlogllc_ts    0.6361     1.0000

However when looking at the residuals it also seems that the model is not validated:

Residuals

My code:

fit_var1 <- VAR(dlogdata,type="none",p=3)
var1_residuals <- resid(fit_var1)
var1_residuals
acf(var1_residuals)
par(mfrow=c(2,2))
acf(var1_residuals[,1])
acf(var1_residuals[,2])
ccf(var1_residuals[,1],var1_residuals[,2])

Can someone please tell my why I am having this significant residual correlation at lag 12? It also happens with other models I have tried fitting. I have already tried to remove seasonality in the beginning:

dlogsl_ts  <- diff(log(sl_ts),  lag = 12)
dlogllc_ts <- diff(log(llc_ts), lag = 12)

If I simply use a VAR(12), the residual structure would not change much:
enter image description here

Now go with VAR(24):
enter image description here

And VAR(48):
enter image description here

Hardly changes the residual structure. It is of course useless to model such a high-order VAR, but just to demonstrate here the "stubbornity" of the residual correlation.

Best Answer

First, you have monthly data, and river flow is a very natural candidate to be seasonal. It evolves in 12-month cycles, thus creating serial correlation at lag 12. What you could do is either seasonally adjust the data before fitting the VAR model or include monthly dummies into the VAR model. You mention you have tried the former approach but it did not work, which is surprising; perhaps you could try an alternative seasonal adjustment procedure instead.

Second, the function VARselect considers only unrestricted models. By that I mean, if lag $k$ is included, lag $k-1$ will also be included; the function does not consider, for example, VAR(12) where all lag 1 through lag 11 coefficients are restricted to zero. Meanwhile, a relevant model for your data could perhaps be VAR(3) plus the 12th lag. That makes a VAR(12) model with a lot of empty lags (4 through 11). You could build the model manually and see how its AIC or BIC values compare to the ones found to be optimal by VARselect.

Edit: I checked out your data. It seems to me the problem is in asymmetry. Water levels have some positive spikes but not negative ones; the shocks are asymmetric. Using a simple VAR model does not account for that, which results in asymmetric and autocorrelated residuals (apparently the spikes are seasonal). Theoretically, perhaps a model with asymmetric errors could work; however, I doubt there is any relevant software implementation.

Related Question