First of all, it is important to note that stationarity is a property of a process, not of a time series. You consider the ensemble of all time series generated by a process. If the statistical properties¹ of this ensemble (mean, variance, …) are constant over time, the process is called stationary. Strictly speaking, it is impossible to say whether a given time series was generated by a stationary process (however, with some assumptions, we can take a good guess).
More intuitively, stationarity means that there are no distinguished points in time for your process (influencing the statistical properties of your observation). Whether this applies to a given process depends crucially on what you consider as fixed or variable for your process, i.e., what is contained in your ensemble.
A typical cause of non-stationarity are time-dependent parameters – which allow to distinguish time points by the values of the parameters. Another cause are fixed initial conditions.
Consider the following examples:
The noise reaching my house from a single car passing at a given time is not a stationary process. E.g., the average amplitude² is highest when the car is directly next to my house.
The noise reaching my house from street traffic in general is a stationary process, if we ignore the time dependency of the traffic intensity (e.g., less traffic at night or on weekends). There are no distinguished points in time anymore. While there may be strong fluctuations of individual time series, these vanish when I consider the ensemble of all realisations of the process.
If I we include known impacts on traffic intensity, e.g., that there is less traffic at night, the process is non-stationary again: The average amplitude² varies with a daily rhythm. Every point in time is distinguished by the time of the day.
The position of a single peppercorn in a pot of boiling water is a stationary process (ignoring the loss of water due to evaporation). There are no distinguished points in time.
The position of a single peppercorn in a pot of boiling water dropped in the exact middle at $t=0$ is not a stationary process, as $t=0$ is a distinguished point in time. The average position of the peppercorn is always in the middle (assuming a symmetric pot without distinguished directions), but at $t=ε$ (with $ε$ small), we can be sure that the peppercorn is somewhere near the middle for every realisation of the process, while at a later time, it can also be closer to the border of the pot.
So, the distribution of positions changes over time. To give a specific example, the standard deviation grows. The distribution quickly converges to the respective distributions of the previous example and if we only take a look at this process for $t>T$ with a sufficiently high $T$, we can neglect the non-stationarity and approximate it as a stationary process for all purposes – the impact of the initial condition has faded away.
¹ For practical purposes, this is sometimes reduced to the mean and the variance (weak stationarity), but I do not consider this helpful to understand the concept. Just ignore weak stationarity until you understood stationarity.
² Which is the mean of the volume, but the standard deviation of the actual sound signal (do not worry too much about this here).
You have arrived to the stationary form of the local level model:
$$
\Delta y_t \equiv x_t = \underbrace{\Delta \alpha_t}_{\eta_{t-1}} + \Delta \epsilon_t \,,
$$
where $\Delta$ is the difference operator such that $\Delta y_t = y_t - y_{t-1}$.
Now, I think it is easier to first check the statistical properties (mean, covariances, autocorrelations) of this stationary form.
For example, the mean of this process is given by:
$$
\hbox{E}[x_t] = \hbox{E}[\eta_{t-1}] + \hbox{E}[\epsilon_t] - \hbox{E}[\epsilon_{t-1}] = 0 + 0 - 0 = 0 \,.
$$
You can do the same to obtain the covariances of order $k$, $\gamma(k)$:
\begin{eqnarray}
\begin{array}{ll}
\gamma(0) &=& E\left[(\eta_{t-1} + \epsilon_t - \epsilon_{t-1})^2\right] = \dots \\
\gamma(1) &=& E\left[(\eta_{t-1} + \epsilon_t - \epsilon_{t-1})(\eta_{t-2} + \epsilon_{t-1} - \epsilon_{t-2})\right] &=& \dots \\
\gamma(2) &=& \cdots \\
\gamma(>2) &=& \cdots
\end{array}
\end{eqnarray}
You just need to take the expectation of the cross-products of all terms bearing in mind that $\eta_t$ and $\epsilon_t$ are independently distributed, they are independent of each other and the variance of each one are respectively $\sigma^2_\eta$ and $\sigma^2_\epsilon$.
Then, it will be straightforward to get the expression of the autocorrelations of order $k>0$, $\rho(k) = \frac{\gamma(k)}{\gamma(0)}$. This will have a form that is characteristic of a moving-average of order 1, MA(1) (the autocorrelations are zero for $k>1$) and, hence, $x_t$ can be represented as a MA(1) process and $y_t$ as an ARIMA(0,1,1) process.
In order to find out the relationship between the parameters of the local level model and the MA coefficient, you can equate the expression of the first order autocorrelation obtained before with the expression of the first order autocorrelation of a MA(1). Following the same strategy as above, you can find that $\rho(1)$ for a MA(1) with coefficient $\theta$ is given by $\rho(1) = \theta/(1 + \theta^2)$. The expression that you get by doing this will also reveal that the local level model is a restricted ARIMA(0,1,1) model where the MA coefficient $\theta$ can take only negative values.
Edit
Equation (c.5) is okay. You can get the relationship between the parameters of the local level model and the MA coefficient solving the equation (c.5) for $\theta$. You can rewrite it as a quadratic equation to be solved for $\theta$. One of the solutions can be discarded because it implies a non-invertible MA, $|\theta|>1$.
When solving this equation, it will be helpful to define $q=\sigma^2_\eta/\sigma^2_\epsilon$. Also, check that $\frac{\sqrt{\sigma^4_\eta + 4\sigma^2_\eta\sigma^2_\epsilon}}{2\sigma^2_\epsilon} = \frac{\sqrt{q^2 + 4q}}{2}$. This way you will get a more neat expression. Then, given that $0 < q < \infty$, you can check that the range of possible values for $\theta$ are zero or negative values.
Best Answer
I think you're making life hard for yourself there. You just need to use a few elementary properties of variances and covariances.
Here's one approach:
start with the algebraic definition of your random walk process.
derive $\text{Var}(y_t)$ in terms of $\text{Var}(y_{t-1})$ and the variance of the error term
show that $\text{Cov}(y_t,y_{t-1}) = Var(y_{t-1})$
argue that $\text{Cov}(y_s,y_{s-1})\neq \text{Cov}(y_t,y_{t-1})$ if $s\neq t$.
... though, frankly, I think even just going to the second step (writing $\text{Var}(y_t)$ in terms of $\text{Var}(y_{t-1})$ and the variance of the error term) is sufficient to establish it's not covariance stationary.