Solved – Understanding autocorrelation of the residuals – ARI(1,1) model

arimaautocorrelationresidualstime series

I am having a hard time understanding how to do model diagnostics, and in particular how to understand the residuals of a fit ARI(1,1) model. I analyzed some data (n=96) and found that it was non-stationary as the autocorrelations were significant up until lag 22.

enter image description here

I did first differencing to correct for this issue, and obtained the following new autocorrelation chart:

Autocorrelation of 1st Differences

The yellow lines are an estimate of two standard errors, calculated as $\sqrt{\frac{2}{n}}$.

In looking at this, it appears that this could be an AR(1) or ARMA(1) process, so I first modeled this as an AR(1) process, obtaining the autoregressive parameter of -0.479. I then tried to calculate the standardized residual to analyze whether this fit the data. To do this, I calculated the residual as actual minus predicted value at each time. The predicted value was calculated by taking $Y_t = \phi_1 * y_{t-1}$ where $Y_t$ is the prediction at time t and $y_{t-1}$ is the actual value at time t-1.

I then calculated the standard error using the formula $s = \sqrt{\frac{1}{n-1} * \sum_{t=1}^n(Y_t – y_t)^2}$ and then took the standardized residual by taking the residual divided by s.

This created the following chart:

Standardized Residuals

So far, I think this analysis is correct, but if there is anything amiss, any direction would be helpful.

Then, I wanted to determine if these residuals could be from a white noise process, so I calculated the autocorrelation of the residuals.

The autocorrelation of the residuals shows many values outside of the confidence interval, which I again set to a rough estimate of $\sqrt{\frac{2}{n}}$

enter image description here

The autocorrelation is calculated as $r_k = \frac{\sum_{t=k+1}^{n} (Y_t – \bar y)(Y_{t-k} – \bar y)}{\sum_{t=1}^n (Y_t – \bar y)^2}$

Then, I calculated the Box-Pierce Q statistic using $Q = n\sum_1^{50} \hat r_k$ which was equal to 7306.

This value seems excessively large (considering the Chi-Square table for degree of freedom 50 (I think I should be using 49, but it's close enough) at a confidence interval of 1% is 29.7. Am I doing something wrong or is this model just vastly incorrect? Should I be doing the final autocorrelation of the residuals on the standardized residuals or the non-standardized residuals? Which should I use for the Box-Pierce statistic?

Sorry if this post is too long. I am a regular StackExchange User and not too knowledgeable about Statistics, so am not sure what the most relevant details would be. Thanks for any help!

Best Answer

Looking at the sample ACF of your differenced series, this seems to cut off at lags $k>1$. This suggests that the differenced series is MA(1) so overall you possibly have an IMA(1,1) model, $$ (1-B)Y_t = (1 - \theta_1 B)w_t. $$ When you fit an incorrect ARI(1,1) model to the data, the residuals will be given by $$ \hat w_t = (1-B)(1-\hat\phi_1 B)Y_t, $$ where $\hat\phi_1$ is the estimate of the AR parameter in the incorrect ARI(1,1) model. Applying $1-\hat\phi_1 B$ to both sides of the first equation yields $$ \hat w_t = (1-\hat\phi_1 B)(1-\theta_1 B)w_t. $$ Provided that the true model is IMA(1,1), this tells you that the residuals $\hat w_t$ from the fitted incorrect ARI(1,1) model should behave like a MA(2) process (rather than white noise if the correct model was fitted). Judged by your plot of the ACF of the residuals, however, this does not seem to be the case for your data (this ACF should cut off at lags $k>2$) which suggests that something else must be going on here or that there is some error in how you have computed the ACF or the residuals.

Related Question