I found an answer in the "vignette" to the "rugarch" package in R. Here is a quote from pages 7-8 (emphasis is mine):
Because of the presence of the indicator function, the persistence of
the model now crucially depends on the asymmetry of the conditional
distribution used. The persistence of the model $\hat P$ is,
$$ \hat P = \sum_{j=1}^q \alpha_j + \sum_{j=1}^p \beta_j + \sum_{j=1}^q \gamma_j\kappa $$
where $\kappa$ is the expected value of the standardized residuals
$z_t$ below zero (effectively the probability of being below zero),
$$ \kappa = \mathbb{E}(\mathbb{I}_{t-j} z_{t-j}^2) = \int_{-\infty}^0 f(z,1,0,\dotsc) dz $$
where $f$ is the standardized conditional density with any additional
skew and shape parameters $(\dotsc)$. In the case of symmetric
distributions the value of $\kappa$ is simply equal to 0.5.
A conditional volatility model such as the GARCH model is defined by the mean equation
\begin{equation}
r_t = \mu + \sigma_t z_t = \mu + \varepsilon_t
\end{equation}
and the GARCH equation (this is for the simple GARCH)
\begin{equation}
\sigma^2_t = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2
\end{equation}
To perform maximum-likelihood estimation, we must make distributional assumptions on $z_t$. It is typically assumed to be i.i.d. $N(0,1)$.
Conditional on the informationset at time t, we have that
\begin{equation}
r_t \sim N(\mu, \sigma_t^2)
\end{equation}
or
\begin{equation}
\varepsilon_t = r_t - \mu \sim N(0, \sigma_t^2)
\end{equation}
However when we perform maximum-likelihood estimation, we are interested in the joint distribution
\begin{equation}
f(\varepsilon_1,...,\varepsilon_T; \theta)
\end{equation}
where $\theta$ is the parameter vector. Using iteratively that the joint distribution is equal to the product of the conditional and the marginal density, we obtain
\begin{eqnarray}
f(\varepsilon_0,...,\varepsilon_T; \theta) &=& f(\varepsilon_0;\theta)f(\varepsilon_1,...,\varepsilon_T\vert \varepsilon_0 ;\theta) \\
&=& f(\varepsilon_0;\theta) \prod_{t=1}^T f(\varepsilon_t \vert \varepsilon_{t-1},...,\varepsilon_{0} ;\theta) \\
&=& f(\varepsilon_0;\theta) \prod_{t=1}^T f(\varepsilon_t \vert \varepsilon_{t-1};\theta) \\
&=& f(\varepsilon_0;\theta) \prod_{t=1}^T \frac{1}{\sqrt{2\pi \sigma_t^2}}\exp\left(-\frac{\varepsilon_t^2}{2\sigma_t^2}\right)
\end{eqnarray}
Dropping $f(\varepsilon_0;\theta)$ and taking logs, we obtain the (conditional) log-likelihood function
\begin{equation}
L(\theta) = \sum_{t=1}^T \frac{1}{2} \left[-\log2\pi-\log(\sigma_t^2) -\frac{\varepsilon_t^2}{\sigma_t^2}\right]
\end{equation}
To question 1): The exact same steps can be followed for the GJR-GARCH model. The log-likelihood functions are similar but not the same due to the different specification for $\sigma_t^2$.
To question 2): One is free to use whatever assumption about the distribution of the innovations, but the calculations will become more tedious. As far as I know, Filtered Historical Simulation is used to performe e.g. VaR forecast. The "fitted" innovations are bootstrapped to better fit the actual empirical distribution. Estimation is still performed using Gaussian (quasi) maximum-likelihood.
Best Answer
In your case, the (annualized) long run variance can be calculated as $$ LRV=E(\sigma_t^2) \cdot 252 $$ or in terms of standard deviations as $$ LRV=\sqrt{E(\sigma_t^2)}\sqrt{252} $$ assuming that there are $252$ trading days within a year and that you estimate your model on daily returns. Now, depending on your model, the expression for $E(\sigma_t^2)$ differs.
For instance, for the GARCH(1,1) you have: $$ E(\sigma_t)^2=\frac{\omega}{1-\alpha-\beta} $$ For the GJR-GARCH(1,1) you get $$ E(\sigma_t^2)=\frac{\omega}{1-\alpha-\frac{\theta}2{-\beta}} $$ if you assume a symmetric distribution for the error term.