Maximum Likelihood – MLE on Structural VAR

errors in variablesestimationmaximum likelihoodstructural-equation-modelingvector-autoregression

I have a simple model that I wish to fit using data. The model is of the form below.

\begin{gather}
y_t = -\lambda r_t + \theta a_t + \varepsilon_1 \\ \\
\pi_t = \pi_{t-1} + w y_t + \varepsilon_2 \\ \\
a_t = -\alpha r_t + \varepsilon_3 \\ \\
r_t = \phi_1 \pi_1 + \phi_2 a_t + \varepsilon_4
\end{gather}

where $\varepsilon_i$ are i.i.d. Gaussian shocks. This model has matrix form:

\begin{gather}
\begin{bmatrix} y_t \\ \pi_t \\ a_t \\ r_t \end{bmatrix}
=
\begin{bmatrix}
0 & 0 & \theta & -\lambda \\
w & 0 & 0 & 0 \\
0 & 0 & 0 & -\alpha \\
0 & \phi_1 & \phi_2 & 0
\end{bmatrix}
\begin{bmatrix} y_t \\ \pi_t \\ a_t \\ r_t \end{bmatrix} +
\begin{bmatrix}
0 & 0 & 0 & 0 \\
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0
\end{bmatrix}
\begin{bmatrix} y_{t-1} \\ \pi_{t-1} \\ a_{t-1} \\ r_{t-1} \end{bmatrix} +
\begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \end{bmatrix}
\end{gather}

i.e.

$$
Y_t = A Y_t + B Y_{t-1} + E
$$

For estimating this model, assume we have a $4 \times N$ matrix of time series data for $y_t, \pi_t, a_t, r_t$ (call it $D$).

How do I fit this model using maximum likelihood estimation? (without using Kalman filters or Bayesian methods)

My understanding is that this is an "Errors-in-variables" problem, which in some cases requires Total Least Squares (TLS). However, the TLS methods I've seen apply to more simple scaler $y=\beta x + \gamma$ problems, rather than VARs (or structural VARs).

Best Answer

I'm not particularly familiar with DSGE models, but a standard maximum likelihood estimation would go like this:

first we can write your matrix equation as:

$$ Y_t = (1-A)^{-1}BY_{t-1} + (1-A)^{-1}E$$

such that we can be express the conditional distribution of $Y_t$ given $Y_{t-1}$ :

$$ Y_t | Y_{t-1} \sim \mathcal N ( \hat Y_t , \Sigma )$$

where $ \hat Y_t = (1-A)^{-1}BY_{t-1}$, and $\Sigma = \sigma^2 (1 - A)^{-1}(1 - A^T)^{-1} $ (assuming $\varepsilon \sim \mathcal N (0,\sigma^2)$

This follows from standard properties of the multivariate normal distribution. The log-likelihood is therefore:

$$ \log\mathcal{L} = -\frac{1}{2}\sum_t (Y_t - \hat Y_t)^T\Sigma^{-1}(Y_t - \hat Y_t) - \frac{1}{2} N \log |\Sigma| $$

Now we can use the fact that $\Sigma^{-1} = (1 - A^T)(1 - A)/\sigma^2 $ to simplify some of the terms :

$$ \hat Y_t^T \Sigma^{-1} Y_t = Y_{t-1}^TB^T(1-A)Y_t = \pi_{t-1}\pi_t - \omega y_t y_{t-1}$$ $$ \hat Y_t^T \Sigma^{-1} \hat Y_t^T = Y_{t-1}B^TBY_{t-1} = y_{t-1}^2$$

Notice that we only have up to quadratic terms of $A$ from the exponential, so taking the derivative with respect to elements of $A$ from that part gives us something linear in $A$, e.g. :

$$ \frac{\partial}{\partial \theta} (\frac{1}{2} Y_t^T \Sigma^{-1} Y_t) = -\frac{1}{2\sigma^2} Y_t^T \left(\frac{dA^T}{d\theta}(1-A) + (1-A^T) \frac{dA}{d\theta}\right) Y_t$$

$$ = -\frac{1}{\sigma^2} \text{tr}\left( \frac{dA^T}{d\theta}(1-A) Y_t Y_t^T\right)$$

Notice that $\frac{dA^T}{d\theta}$ is a matrix that has just a single non zero entry at the location of $\theta$ , so we can write this more generally in a matrix notation

$$ \frac{\partial}{\partial A} (\frac{1}{2}\sum_t Y_t^T \Sigma^{-1} Y_t) = -\frac{1}{\sigma^2} (1 - A) \left(\sum_t Y_t Y_t^T \right) $$ (However keep in mind that this should be read as as equation for the non zero entries of $A$ only). Now we are left to deal with the determinant part, for which we can take the derivative as follows :

$$ \frac{\partial}{\partial \theta} \log |\Sigma| = -\text{tr}(\frac{\partial \Sigma^{-1}}{\partial \theta} \Sigma) $$

plugging in the derivative of $\Sigma^{-1}$ we get

$$ \frac{\partial}{\partial \theta} \log |\Sigma| = \frac{1}{\sigma^2}\text{tr}\left(\frac{dA^T}{d\theta}(1-A^T)^{-1} + (1-A)^{-1} \frac{dA}{d\theta}\right) = \frac{2}{\sigma^2}\text{tr}\left( \frac{dA^T}{d\theta}(1-A^T)^{-1} \right) $$

and as before we can write it as

$$ \frac{\partial}{\partial A} \log |\Sigma| = \frac{2}{\sigma^2}(1-A^T)^{-1} $$

putting everything together :

$$ \sigma^2 \frac{\partial}{\partial A} \log \mathcal{L} = (1 - A) \left(\sum_t Y_t Y_t^T \right) - B \left(\sum_t Y_t Y_{t-1}^T \right) - N(1-A^T)^{-1}.$$

Equating this to zero gives the maximum likelihood equation for $A$, But the equation is non linear and therefore can't be solved directly (in fact it can be written as a quadratic, but I don't think that helps much).

Nevertheless it can be easily solved iteratively using gradient descent methods, with the above gradient. Note that the dependence on the data $Y$ is only via the two correlation matrices defined above, which you only have to calculate once. Also note that the MLE does not depend on the noise variance $\sigma^2$.

[Note: I can't guarantee that every (-) sign and $^T$ is correct, if you are going to use some gradient based optimization tool you should verify the result against a numerical gradient, or you could use a symbolic program to calculate the derivatives. If you want to work out the details of the derivation by yourself, the Matrix Cookbook is a useful reference.]

Related Question