GARCH – Maximum Likelihood in the GJR-GARCH(1,1) Model Explained

estimationgarchmaximum likelihood

In the standard GARCH(1,1) model with normal innovations

$\sigma^2_t=\omega+\alpha\epsilon^2_{t-1}+\beta\sigma^2_{t-1} $

the likelihood of $m$ observations occurring in the order in which they are observed is

$\sum_{t=1}^m\left[-\ln(\sigma^2_{t})-{\left(\frac{\epsilon^2_{t}}{\sigma^2_{t}}\right)}\right] $

This expression, with the usual caveats of optimization, allows us to obtain the MLE estimates of the GARCH(1,1) parameters.

However, in the GJR-GARCH(1,1) model by Glosten et al. (1993), the conditional variance is

$ \sigma^2_t=\omega+(\alpha+\gamma I_{t-1})\epsilon^2_{t-1}+\beta\sigma^2_{t-1} $

where $ I_{t-1} $ is the indicator function:
$I_{t-1}(\epsilon_{t-1})=1 $ when $\epsilon_{t-1}<0$ and
$I_{t-1}(\epsilon_{t-1})=0$ otherwise.

Question: Is there a closed-form expression for the likelihood function in the GJR-GARCH(1,1) with normal innovations?

EDIT: Per comments, the likelihood function in the GJR-GARCH(1,1) model is the same than in the standard GARCH(1,1):

  1. Can someone provide a reference/explanation to justify this?
  2. If we use empirical innovations instead of normal ones (e.g: a Filtered Historical Simulation/FHS approach), would this change the functional form of the likelihood function? (my guess is that empirical innovations does not affect the likelihood function, but any reference or explanation will be highly welcome)

Best Answer

A conditional volatility model such as the GARCH model is defined by the mean equation

\begin{equation} r_t = \mu + \sigma_t z_t = \mu + \varepsilon_t \end{equation}

and the GARCH equation (this is for the simple GARCH)

\begin{equation} \sigma^2_t = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \end{equation}

To perform maximum-likelihood estimation, we must make distributional assumptions on $z_t$. It is typically assumed to be i.i.d. $N(0,1)$.

Conditional on the informationset at time t, we have that

\begin{equation} r_t \sim N(\mu, \sigma_t^2) \end{equation}

or

\begin{equation} \varepsilon_t = r_t - \mu \sim N(0, \sigma_t^2) \end{equation}

However when we perform maximum-likelihood estimation, we are interested in the joint distribution \begin{equation} f(\varepsilon_1,...,\varepsilon_T; \theta) \end{equation} where $\theta$ is the parameter vector. Using iteratively that the joint distribution is equal to the product of the conditional and the marginal density, we obtain \begin{eqnarray} f(\varepsilon_0,...,\varepsilon_T; \theta) &=& f(\varepsilon_0;\theta)f(\varepsilon_1,...,\varepsilon_T\vert \varepsilon_0 ;\theta) \\ &=& f(\varepsilon_0;\theta) \prod_{t=1}^T f(\varepsilon_t \vert \varepsilon_{t-1},...,\varepsilon_{0} ;\theta) \\ &=& f(\varepsilon_0;\theta) \prod_{t=1}^T f(\varepsilon_t \vert \varepsilon_{t-1};\theta) \\ &=& f(\varepsilon_0;\theta) \prod_{t=1}^T \frac{1}{\sqrt{2\pi \sigma_t^2}}\exp\left(-\frac{\varepsilon_t^2}{2\sigma_t^2}\right) \end{eqnarray} Dropping $f(\varepsilon_0;\theta)$ and taking logs, we obtain the (conditional) log-likelihood function \begin{equation} L(\theta) = \sum_{t=1}^T \frac{1}{2} \left[-\log2\pi-\log(\sigma_t^2) -\frac{\varepsilon_t^2}{\sigma_t^2}\right] \end{equation}

To question 1): The exact same steps can be followed for the GJR-GARCH model. The log-likelihood functions are similar but not the same due to the different specification for $\sigma_t^2$.

To question 2): One is free to use whatever assumption about the distribution of the innovations, but the calculations will become more tedious. As far as I know, Filtered Historical Simulation is used to performe e.g. VaR forecast. The "fitted" innovations are bootstrapped to better fit the actual empirical distribution. Estimation is still performed using Gaussian (quasi) maximum-likelihood.