For a VAR(1), we write the model as
$$
y_t=\Pi y_{t-1}+\epsilon_t
$$
where $y$ and $\epsilon$ are $p\times 1$ vectors. If you have more lags, the idea of extension is the same (and it is particularly straight-forward using the companion form).
The impulse response is the derivative with respect to the shocks. So the impulse response at horizon $h$ of the variables to an exogenous shock to variable $j$ is
$$
\frac{\partial y_{t+h}}{\partial \epsilon_{j, t}}=\frac{\partial }{\partial \epsilon_{j, t}}\left(\Pi y_{t+h-1}+\epsilon_{t+h-1}\right)=\cdots=\frac{\partial }{\partial \epsilon_{j, t}}\left(\Pi^{h+1} y_{t}+\sum_{i=0}^h\Pi^i\epsilon_{t+h-i}\right).
$$
This derivative will eliminate all terms but one, namely the term in the sum which is $\Pi^h\epsilon_t$, for which we get
$$
\frac{\partial y_{t+h}}{\partial \epsilon_{j, t}}=\frac{\partial }{\partial \epsilon_{j, t}}\left(\Pi^{h+1} y_{t}+\sum_{i=0}^h\Pi^i\epsilon_{t+h-i}\right)=\frac{\partial }{\partial \epsilon_{j, t}}\Pi^h\epsilon_{t}=\Pi^he_j
$$
where $e_j$ is the $j$th row of the $p\times p$ identity matrix. That is, the response of all $p$ variables at horizon $h$ to a shock to variable $j$ is the $j$th column of $\Pi^h$. If you take the derivative with respect to the matrix $\epsilon_t$ instead, the result will be a matrix which is just $\Pi^h$, since the selection vectors all taken together will give you the identity matrix.
That is the non-orthogonalized case without identification, which I believe is not so common in the literature. What people usually use is either some sophisticated identification scheme, or more often a Cholesky decomposition. To study this, it is more convenient to work with the vector moving average form of the model (which exists if it is stationary)
$$
y_t=\sum_{s=0}^\infty\Psi_s\epsilon_{t-s}.
$$
The problem for interpretation is when the error terms are correlated, because then an exogenous shock to variable $j$ is simultaneously correlated with a shock to variable $k$, for example. To eliminate this, you can use a Cholesky decomposition which orthogonalizes the innovations. Let's suppose that the covariance matrix of the errors is $\Omega$. We decompose it as $\Omega=PP'$ and introduce $v_t=P^{-1}\epsilon_t$ which are error terms with the identity matrix as covariance matrix. Do some manipulation:
$$
y_t=\sum_{s=0}^\infty\Psi_s\epsilon_{t-s}=\sum_{s=0}^\infty\Psi_sPP^{-1}\epsilon_{t-s}=\sum_{s=0}^\infty\Psi_s^*v_{t-s}.
$$
where $\Psi_s^*=\Psi_sP$.
Consider now the response to an orthogonalized shock:
$$
\frac{\partial y_{t+h}}{\partial v_{j, t}}=\frac{\partial }{\partial v_{j, t}}\left(\sum_{s=0}^\infty\Psi_s^*v_{t+h-s}\right)=\Psi_h^*e_j.
$$
To calculate this in practice, you will need to find the moving average matrices $\Psi$. This you do recursively. If you have $K$ lags:
$$
\Psi_s=0, \quad (s=-K+1, -K+2, \dots, -1)\\
\Psi_0=I\\
\Psi_s=\sum_{i=1}^K\Pi_i\Psi_{s-i}, \quad (s=1, 2, \dots).
$$
With estimates, you just put hats on the $\Pi$ matrices and proceed. $P$ we find from using a Cholesky decomposition of the estimated error covariance matrix, $\hat\Omega$.
Edit: In univariate time series analysis, one standard result is that every AR process can be written as an MA($\infty$) process. You have the same result for multivariate time series, meaning that we can always rewrite a stationary VAR($p$) as a VMA($\infty$). This is central to impulse response analysis. The case with only one lag is the easiest. In this case, we may write
$$
y_t=\Pi y_{t-1}+\epsilon_t=\Pi(\Pi y_{t-2}+\epsilon_{t-1})+\epsilon_t=\cdots=\sum_{s=0}^\infty \Pi^i\epsilon_{t-s}.
$$
The implied steps in the $\cdots$ part might not be obvious, but there is just a repeated substitution going on using the recursive nature of the model. So for the VAR(1), the moving average coefficients $\Psi_s$ are just $\Psi_s=\Pi^s$. For more lags, it gets a little more complicated, but above you will find the recursive relations.
In impulse response analysis, the moving average form of the model is particularly convenient. The reason is that if you want to find the response of $y_{t+h}$ to a shock to $\epsilon_{j, t}$, then if you start with the usual VAR(1) form
$$
y_{t+h}=\Pi y_{t+h-1}+\epsilon_{t+h},
$$
then there is no $\epsilon_t$ in your model as it stands, but you will have to do recursive substitution until you get to it (as I did in the beginning). But, if you have the moving average form of the model, you have it immediately on the right hand side. So for the VAR(1), you will find that
$$
\frac{\partial y_{t+h}}{\partial \epsilon_{j, t}}=\frac{\partial}{\partial \epsilon_{j, t}}\left(\sum_{s=0}^\infty\Psi_s\epsilon_{t+h-s}\right)=\Psi_he_j=\Pi^he_j,
$$
where $e_j$ again is the $j$th column of the $p\times p$ identity matrix. As you see, this is the same result as we found in the beginning, but here we used the moving average form of the model to do it. But the two representations are just two sides of the same coin.
This does not answer the general question "Is there a remedy for removing autocorrelations from residuals of seasonally fitted ARIMA model?" but relates to the specific question of this data.
I used the auto.arima()
function in the "forecast" package in R, and the residuals from ARIMA(0,0,0)(2,1,0)[12] seem to come out clean. Here's the code:
plot(x)
xts <- ts(x,start=1,frequency=12) #convert to a time series
plot(xts)
library(fpp) #load forecasting package (with files related to Hyndman's textbook)
mod1 <- auto.arima(xts)
mod1
acf(mod1$residual)
pacf(mod1$residual)
Best Answer
As @RichardHardy points out, normality of the errors is not necessary for estimated IRFs. The IRFs are functions of the estimate of the variance-covariance matrix of the residuals and of the VAR slope coefficients, both of which can be shown to follow asymptotic normal distributions under weaker conditions than normality (e.g., finite fourth moments and symmetric error distributions).
Hence, by the delta method, one may establish that the IRF estimates also are asymptotically normal.
That said, if the errors are normal, the asymptotic variance-covariance matrix of the impulse response estimator simplifies. Depending on the software you use, it may be the case that this simplification is assumed, so that it would also be necessary that normality is satisfied for inference (e.g., confidence bands) to be asymptotically valid.
Detailed discussion of these issues is for example provided in Lütkepohl, New Introduction To Multiple Time Series Analysis. See in particular Proposition 3.6 and the subsequent Remark 4.