A conditional volatility model such as the GARCH model is defined by the mean equation
\begin{equation}
r_t = \mu + \sigma_t z_t = \mu + \varepsilon_t
\end{equation}
and the GARCH equation (this is for the simple GARCH)
\begin{equation}
\sigma^2_t = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2
\end{equation}
To perform maximum-likelihood estimation, we must make distributional assumptions on $z_t$. It is typically assumed to be i.i.d. $N(0,1)$.
Conditional on the informationset at time t, we have that
\begin{equation}
r_t \sim N(\mu, \sigma_t^2)
\end{equation}
or
\begin{equation}
\varepsilon_t = r_t - \mu \sim N(0, \sigma_t^2)
\end{equation}
However when we perform maximum-likelihood estimation, we are interested in the joint distribution
\begin{equation}
f(\varepsilon_1,...,\varepsilon_T; \theta)
\end{equation}
where $\theta$ is the parameter vector. Using iteratively that the joint distribution is equal to the product of the conditional and the marginal density, we obtain
\begin{eqnarray}
f(\varepsilon_0,...,\varepsilon_T; \theta) &=& f(\varepsilon_0;\theta)f(\varepsilon_1,...,\varepsilon_T\vert \varepsilon_0 ;\theta) \\
&=& f(\varepsilon_0;\theta) \prod_{t=1}^T f(\varepsilon_t \vert \varepsilon_{t-1},...,\varepsilon_{0} ;\theta) \\
&=& f(\varepsilon_0;\theta) \prod_{t=1}^T f(\varepsilon_t \vert \varepsilon_{t-1};\theta) \\
&=& f(\varepsilon_0;\theta) \prod_{t=1}^T \frac{1}{\sqrt{2\pi \sigma_t^2}}\exp\left(-\frac{\varepsilon_t^2}{2\sigma_t^2}\right)
\end{eqnarray}
Dropping $f(\varepsilon_0;\theta)$ and taking logs, we obtain the (conditional) log-likelihood function
\begin{equation}
L(\theta) = \sum_{t=1}^T \frac{1}{2} \left[-\log2\pi-\log(\sigma_t^2) -\frac{\varepsilon_t^2}{\sigma_t^2}\right]
\end{equation}
To question 1): The exact same steps can be followed for the GJR-GARCH model. The log-likelihood functions are similar but not the same due to the different specification for $\sigma_t^2$.
To question 2): One is free to use whatever assumption about the distribution of the innovations, but the calculations will become more tedious. As far as I know, Filtered Historical Simulation is used to performe e.g. VaR forecast. The "fitted" innovations are bootstrapped to better fit the actual empirical distribution. Estimation is still performed using Gaussian (quasi) maximum-likelihood.
The distributional assumption for a DCC-GARCH model considers standardized model residuals (a multivariate time series). Standardization is done by scaling the raw residuals (from the conditional mean model, if any) by premultiplying them by the square root of inverted conditional variance matrix, $\hat\Sigma^{-1/2}$, for each time period, so as to make the residuals roughly uncorrelated with approximately unit variances.
In theory, each univariate series could have a different marginal distribution, and different copulas could be used to obtain different joint distributions. In practice, software implementations usually have a more limited choice of distributions; for example, you may choose between multivariate normal, multivariate skew normal and multivariate $t$-distribution. The main thing is to achieve an empirical distribution that is not too far from the assumed one. However, perhaps the idea of quasi maximum likelihood estimation could be used to defend a mismatch if the assumed distribution is multivariate normal.
The DCC-GARCH model is estimated in two stages; first, the univariate GARCH models, and second, the DCC part. In the first stage, the relevant distributional assumptions will be those of the marginal distributions. So if the assumed multivariate distribution is multivariate skew normal, I guess the marginal distributions will be univariate skew normal, and the skew parameter(s) will be important in the first stage. I am not sure how one builds a multivariate skew normal distribution from univariate skew normal distributions; if the skew parameter(s) have a role there, then they will be important also in the second stage of the estimation (the DCC part). I do not have full control of the details here, but the idea should be clear.
Best Answer
I found an answer in the "vignette" to the "rugarch" package in R. Here is a quote from pages 7-8 (emphasis is mine):