You can set the starting values for the conditional variance/mean values to their theoretical unconditional values. For errors you could set them to zero (because their mean = 0). For other AR,MA parameters you can solve the Yule-Walker equation, for garch parameters to the best of my knowledge you need to fix them to some arbitrarily values BUT they should respect the theoretical constraints ( positivity constraints, stationarity constraints...). Ex for Garch(1,1) , constraints: alpha + beta <= 1 ...
EDIT : For Initial values of the ARMA part you can have a look to the following article : (paragraph "16.2 Initial values" ):
A Package for Estimating, Forecasting and Simulating Arfima
Models: Arfima package 1.0 for Ox / By Jurgen A. Doornik and Marius Ooms
You are seeking something encoding of the design matrix called "sum-to-zero" coding. This means we are seeking effects from the grand mean, i.e. how the groups differ from the grand mean. This is in contrast with the usual "dummy" coding (the coding in the wikipedia link) interpretation which seeks effects from a a base treatment group.
$$
\begin{bmatrix}
y_{1,1} \\
y_{1,2} \\
y_{2,1} \\
y_{2,2} \\
y_{3,1} \\
y_{3,2} \\
y_{4,1} \\
y_{4,2}
\end{bmatrix} =
\begin{bmatrix}
1 & 0 & 0 \\
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
0 & 0 & 1 \\
-1 & -1 & -1 \\
-1 & -1 & -1
\end{bmatrix}
\times
\begin{bmatrix}
\mu+\alpha_1 \\
\mu+\alpha_2 \\
\mu+\alpha_3 \\
\end{bmatrix}
+
\begin{bmatrix}
\epsilon_{1,1} \\
\epsilon_{1,2} \\
\epsilon_{2,1} \\
\epsilon_{2,2} \\
\epsilon_{3,1} \\
\epsilon_{3,2} \\
\epsilon_{4,1} \\
\epsilon_{4,2}
\end{bmatrix}
$$
The constraint $\sum^k_{j=1} a_j=0$ is equivalent to $a_{base}=0$ in "dummy" coding. The purpose of the sum-constraint is used is for identifiability of the coefficient estimates. If we estimated without this sum-constraint and use the encoding above, we would find that there are infinitely many solutions to the estimation of $\mu$, $\alpha_1$, $\alpha_2$ and $\alpha_3$. See this link here for a clear mathematical explanation by example Sum-to-zero constraint in one-way ANOVA. A different constraint leads to a different encoding.
Best Answer
Let's define a general ARMA model of orders $(p,q)$ as follows:
$$ \psi_t \equiv \sum_{i=0}^p \alpha_i\, y_{t-i} = \sum_{i=0}^q \theta_i\, \epsilon_{t-i} \,, \mbox{ with } \epsilon_t \sim NID\,(0, \sigma^2_\epsilon) \,. $$
where $\alpha_0$ and $\theta_0$ are normalised to $1$.
You can check that multiplying $\psi_t$ by $\psi_{t-\tau}$ and taking expectations in both sides of the equation yields:
\begin{equation} \sum_{i=0}^p \sum_{j=0}^p \alpha_i \alpha_j \gamma_{\tau+j-i} = \sigma^2_\epsilon \sum_{j=0}^{q-\tau} \theta_j \theta_{j+\tau} \,, \end{equation}
where $\gamma_i$ is the autocovariance of order $i$.
The mapping between the autocovariances and the parameters in an ARMA model is not as rewarding as in an AR model. The above equation does not return a system of equations that can be easily solved to obtain an estimate of the parameters by the method of moments. The Yule-Walker equations are instead easy to solve and return an estimate of the AR coefficients.
Although it is not straightforward, the method of moments can still be applied for an ARMA model by means of a two-steps procedure: the first step uses the Yule-Walker equations and the second step is based on the equation given above. If your question goes in this direction I could give you further details about it.
Edit
The following is an extract from pp. 545-546 in D.S.G. Pollock (1999) A handbook of time series analysis, signal processing and dynamics, Academic Press (changed notation $\theta$ is $\mu$ in the original source):
1)
\begin{eqnarray} \begin{array}{lcl} E(\psi_t\psi_{t-\tau}) &=& E\left\{ \left( \sum_i \theta_i \epsilon_{t-i} \right) \left( \sum_j \theta_j \epsilon_{t-\tau-j} \right) \right\} \\ &=& \sum_i \sum_j \theta_i \theta_j E(\epsilon_{t-i} \epsilon_{t-\tau-j}) \\ &=& \sigma^2_\epsilon \sum_j \theta_j \theta_{j+\tau} \,. \end{array} \end{eqnarray}
2)
\begin{eqnarray} \begin{array}{lcl} E(\psi_t\psi_{t-\tau}) &=& E\left\{ \left( \sum_i \alpha_i y_{t-i} \right) \left( \sum_j \alpha_j y_{t-\tau-j} \right) \right\} \\ &=& \sum_i \sum_j \alpha_i \alpha_j E(y_{t-i} y_{t-\tau-j}) \\ &=& \sum_i \sum_j \alpha_i \alpha_j \gamma_{\tau+j-i} \,. \end{array} \end{eqnarray}
Putting (1) and (2) together:
$$ \sum_i\sum_j \alpha_i\alpha_j\gamma_{\tau+j-i} = \sigma^2_\epsilon \sum_j \theta_j \theta_{j+\tau} \,. $$