Time-Series ARIMA – How to Assess Normality of Residuals for Impulse Response Function Analysis

arimaimpulse responsenormality-assumptionresidualstime series

I know the issue of normality of residuals has been discussed here quite a lot, and I've learned that there are some cases in which it can be a less important hypothesis to test, while more critical in others (see for example here and especially here – discussion of the ARIMA case in the comments to the OP). Anyhow, non of the posts I read dealt specifically with the meaning of normality of residuals on the impulse response function from ARIMA type models. Thus I'm still confused about the implications of non normality (rejecting Jarque-Berra H0) in this case.

ARIMA/VAR models assume residuals to be white noise. meaning: zero mean, finite variance and no serial correlation. this can be checked with a Portmanteau test for example. But do the residuals have to be gaussian also? there is a similar question here, but the answer doesn't refer to the normality part.
AFAIK, if W.N is also gaussian, it means that the residuals are i.i.d. I've read somewhere that i.i.d residuals are required for IRF in the case of VAR. As VAR is very much based on ARIMA, I'm inclined towards understanding that this is true for univariate ARIMA too. But since using IRF in ARIMA isn't so common, I couldn't find any definitive answer.

it would be great if anyone here could clarify this theoretically.

Practically, the reason I'm asking is that I'm banging my head trying to fit ARIMA to a bunch of time series, in order to be able find it's underlying dynamics using IRF. I've met all the other assumptions and the fit was reasonable, but I couldn't get the residuals to be normal even with differencing/power transformations and also considering level shifts etc. using tsoutliers::tso() in R. of the data. So I don't know whether I should trust any IRF results.

The data I have is a daily time series (N = 335) of news media coverage of different topics. I assume the reason for non-normality is the fact that this data can have lots of zeros (no coverage of a topic on a given day) and also a small amount of ones (which means a topic was salient on that day) which makes a heavy right tail. treating ones as outliers or replacing zeros with some other value (MA for instance) would miss the whole point.

So what I'd really like to clarify at this point is whether ARIMA and IRF can be used without normality, and if not – what else should I try? In the latter case, I probably should post another question with some data so that "what else should I try" could be answered properly, but any general pointers would be appreciated as well.

Best Answer

As @RichardHardy points out, normality of the errors is not necessary for estimated IRFs. The IRFs are functions of the estimate of the variance-covariance matrix of the residuals and of the VAR slope coefficients, both of which can be shown to follow asymptotic normal distributions under weaker conditions than normality (e.g., finite fourth moments and symmetric error distributions).

Hence, by the delta method, one may establish that the IRF estimates also are asymptotically normal.

That said, if the errors are normal, the asymptotic variance-covariance matrix of the impulse response estimator simplifies. Depending on the software you use, it may be the case that this simplification is assumed, so that it would also be necessary that normality is satisfied for inference (e.g., confidence bands) to be asymptotically valid.

Detailed discussion of these issues is for example provided in Lütkepohl, New Introduction To Multiple Time Series Analysis. See in particular Proposition 3.6 and the subsequent Remark 4.

Related Solutions

Impulse Response Function of VAR(1) – Calculation with Example

For a VAR(1), we write the model as $$ y_t=\Pi y_{t-1}+\epsilon_t $$ where $y$ and $\epsilon$ are $p\times 1$ vectors. If you have more lags, the idea of extension is the same (and it is particularly straight-forward using the companion form).

The impulse response is the derivative with respect to the shocks. So the impulse response at horizon $h$ of the variables to an exogenous shock to variable $j$ is $$ \frac{\partial y_{t+h}}{\partial \epsilon_{j, t}}=\frac{\partial }{\partial \epsilon_{j, t}}\left(\Pi y_{t+h-1}+\epsilon_{t+h-1}\right)=\cdots=\frac{\partial }{\partial \epsilon_{j, t}}\left(\Pi^{h+1} y_{t}+\sum_{i=0}^h\Pi^i\epsilon_{t+h-i}\right). $$ This derivative will eliminate all terms but one, namely the term in the sum which is $\Pi^h\epsilon_t$, for which we get $$ \frac{\partial y_{t+h}}{\partial \epsilon_{j, t}}=\frac{\partial }{\partial \epsilon_{j, t}}\left(\Pi^{h+1} y_{t}+\sum_{i=0}^h\Pi^i\epsilon_{t+h-i}\right)=\frac{\partial }{\partial \epsilon_{j, t}}\Pi^h\epsilon_{t}=\Pi^he_j $$ where $e_j$ is the $j$th row of the $p\times p$ identity matrix. That is, the response of all $p$ variables at horizon $h$ to a shock to variable $j$ is the $j$th column of $\Pi^h$. If you take the derivative with respect to the matrix $\epsilon_t$ instead, the result will be a matrix which is just $\Pi^h$, since the selection vectors all taken together will give you the identity matrix.

That is the non-orthogonalized case without identification, which I believe is not so common in the literature. What people usually use is either some sophisticated identification scheme, or more often a Cholesky decomposition. To study this, it is more convenient to work with the vector moving average form of the model (which exists if it is stationary) $$ y_t=\sum_{s=0}^\infty\Psi_s\epsilon_{t-s}. $$ The problem for interpretation is when the error terms are correlated, because then an exogenous shock to variable $j$ is simultaneously correlated with a shock to variable $k$, for example. To eliminate this, you can use a Cholesky decomposition which orthogonalizes the innovations. Let's suppose that the covariance matrix of the errors is $\Omega$. We decompose it as $\Omega=PP'$ and introduce $v_t=P^{-1}\epsilon_t$ which are error terms with the identity matrix as covariance matrix. Do some manipulation: $$ y_t=\sum_{s=0}^\infty\Psi_s\epsilon_{t-s}=\sum_{s=0}^\infty\Psi_sPP^{-1}\epsilon_{t-s}=\sum_{s=0}^\infty\Psi_s^*v_{t-s}. $$ where $\Psi_s^*=\Psi_sP$. Consider now the response to an orthogonalized shock: $$ \frac{\partial y_{t+h}}{\partial v_{j, t}}=\frac{\partial }{\partial v_{j, t}}\left(\sum_{s=0}^\infty\Psi_s^*v_{t+h-s}\right)=\Psi_h^*e_j. $$

To calculate this in practice, you will need to find the moving average matrices $\Psi$. This you do recursively. If you have $K$ lags: $$ \Psi_s=0, \quad (s=-K+1, -K+2, \dots, -1)\\ \Psi_0=I\\ \Psi_s=\sum_{i=1}^K\Pi_i\Psi_{s-i}, \quad (s=1, 2, \dots). $$ With estimates, you just put hats on the $\Pi$ matrices and proceed. $P$ we find from using a Cholesky decomposition of the estimated error covariance matrix, $\hat\Omega$.

Edit: In univariate time series analysis, one standard result is that every AR process can be written as an MA($\infty$) process. You have the same result for multivariate time series, meaning that we can always rewrite a stationary VAR($p$) as a VMA($\infty$). This is central to impulse response analysis. The case with only one lag is the easiest. In this case, we may write $$ y_t=\Pi y_{t-1}+\epsilon_t=\Pi(\Pi y_{t-2}+\epsilon_{t-1})+\epsilon_t=\cdots=\sum_{s=0}^\infty \Pi^i\epsilon_{t-s}. $$ The implied steps in the $\cdots$ part might not be obvious, but there is just a repeated substitution going on using the recursive nature of the model. So for the VAR(1), the moving average coefficients $\Psi_s$ are just $\Psi_s=\Pi^s$. For more lags, it gets a little more complicated, but above you will find the recursive relations.

In impulse response analysis, the moving average form of the model is particularly convenient. The reason is that if you want to find the response of $y_{t+h}$ to a shock to $\epsilon_{j, t}$, then if you start with the usual VAR(1) form $$ y_{t+h}=\Pi y_{t+h-1}+\epsilon_{t+h}, $$ then there is no $\epsilon_t$ in your model as it stands, but you will have to do recursive substitution until you get to it (as I did in the beginning). But, if you have the moving average form of the model, you have it immediately on the right hand side. So for the VAR(1), you will find that $$ \frac{\partial y_{t+h}}{\partial \epsilon_{j, t}}=\frac{\partial}{\partial \epsilon_{j, t}}\left(\sum_{s=0}^\infty\Psi_s\epsilon_{t+h-s}\right)=\Psi_he_j=\Pi^he_j, $$ where $e_j$ again is the $j$th column of the $p\times p$ identity matrix. As you see, this is the same result as we found in the beginning, but here we used the moving average form of the model to do it. But the two representations are just two sides of the same coin.

Solved – remedy for removing autocorrelations from residuals of seasonally fitted ARIMA model

This does not answer the general question "Is there a remedy for removing autocorrelations from residuals of seasonally fitted ARIMA model?" but relates to the specific question of this data.

I used the auto.arima() function in the "forecast" package in R, and the residuals from ARIMA(0,0,0)(2,1,0)[12] seem to come out clean. Here's the code:

plot(x)

xts <- ts(x,start=1,frequency=12) #convert to a time series

plot(xts)

library(fpp)  #load forecasting package (with files related to Hyndman's textbook)

mod1 <- auto.arima(xts)

mod1

acf(mod1$residual)

pacf(mod1$residual)

autocorrelation of residuals

pacf of residuals

Best Answer

Related Solutions

Impulse Response Function of VAR(1) – Calculation with Example

Solved – remedy for removing autocorrelations from residuals of seasonally fitted ARIMA model

Related Question