You have arrived to the stationary form of the local level model:
$$
\Delta y_t \equiv x_t = \underbrace{\Delta \alpha_t}_{\eta_{t-1}} + \Delta \epsilon_t \,,
$$
where $\Delta$ is the difference operator such that $\Delta y_t = y_t - y_{t-1}$.
Now, I think it is easier to first check the statistical properties (mean, covariances, autocorrelations) of this stationary form.
For example, the mean of this process is given by:
$$
\hbox{E}[x_t] = \hbox{E}[\eta_{t-1}] + \hbox{E}[\epsilon_t] - \hbox{E}[\epsilon_{t-1}] = 0 + 0 - 0 = 0 \,.
$$
You can do the same to obtain the covariances of order $k$, $\gamma(k)$:
\begin{eqnarray}
\begin{array}{ll}
\gamma(0) &=& E\left[(\eta_{t-1} + \epsilon_t - \epsilon_{t-1})^2\right] = \dots \\
\gamma(1) &=& E\left[(\eta_{t-1} + \epsilon_t - \epsilon_{t-1})(\eta_{t-2} + \epsilon_{t-1} - \epsilon_{t-2})\right] &=& \dots \\
\gamma(2) &=& \cdots \\
\gamma(>2) &=& \cdots
\end{array}
\end{eqnarray}
You just need to take the expectation of the cross-products of all terms bearing in mind that $\eta_t$ and $\epsilon_t$ are independently distributed, they are independent of each other and the variance of each one are respectively $\sigma^2_\eta$ and $\sigma^2_\epsilon$.
Then, it will be straightforward to get the expression of the autocorrelations of order $k>0$, $\rho(k) = \frac{\gamma(k)}{\gamma(0)}$. This will have a form that is characteristic of a moving-average of order 1, MA(1) (the autocorrelations are zero for $k>1$) and, hence, $x_t$ can be represented as a MA(1) process and $y_t$ as an ARIMA(0,1,1) process.
In order to find out the relationship between the parameters of the local level model and the MA coefficient, you can equate the expression of the first order autocorrelation obtained before with the expression of the first order autocorrelation of a MA(1). Following the same strategy as above, you can find that $\rho(1)$ for a MA(1) with coefficient $\theta$ is given by $\rho(1) = \theta/(1 + \theta^2)$. The expression that you get by doing this will also reveal that the local level model is a restricted ARIMA(0,1,1) model where the MA coefficient $\theta$ can take only negative values.
Edit
Equation (c.5) is okay. You can get the relationship between the parameters of the local level model and the MA coefficient solving the equation (c.5) for $\theta$. You can rewrite it as a quadratic equation to be solved for $\theta$. One of the solutions can be discarded because it implies a non-invertible MA, $|\theta|>1$.
When solving this equation, it will be helpful to define $q=\sigma^2_\eta/\sigma^2_\epsilon$. Also, check that $\frac{\sqrt{\sigma^4_\eta + 4\sigma^2_\eta\sigma^2_\epsilon}}{2\sigma^2_\epsilon} = \frac{\sqrt{q^2 + 4q}}{2}$. This way you will get a more neat expression. Then, given that $0 < q < \infty$, you can check that the range of possible values for $\theta$ are zero or negative values.
Why create a whole new method, i.e., time series (ARIMA), instead of using multiple linear regression and adding lagged variables to it (with the order of lags determined using ACF and PACF)?
One immediate point is that a linear regression only works with observed variables while ARIMA incorporates unobserved variables in the moving average part; thus, ARIMA is more flexible, or more general, in a way. AR model can be seen as a linear regression model and its coefficients can be estimated using OLS; $\hat\beta_{OLS}=(X'X)^{-1}X'y$ where $X$ consists of lags of the dependent variable that are observed. Meanwhile, MA or ARMA models do not fit into the OLS framework since some of the variables, namely the lagged error terms, are unobserved, and hence the OLS estimator is infeasible.
one G-M assumption is that the independent variables should be normally distributed? or just the dependent variable conditional to the independent ones?
The normality assumption is sometimes invoked for model errors, not for the independent variables. However, normality is required neither for the consistency and efficiency of the OLS estimator nor for the Gauss-Markov theorem to hold. Wikipedia article on the Gauss-Markov theorem states explicitly that "The errors do not need to be normal".
multicollinearity between variables may (obviously) arise, so estimates would be wrong.
A high degree of multicollinearity means inflated variance of the OLS estimator. However, the OLS estimator is still BLUE as long as the multicollinearity is not perfect. Thus your statement does not look right.
It is obvious that even with lagged variables OLS problems arise and it is not efficient nor correct, but when using maximum likelihood, do these problems persist?
An AR model can be estimated using both OLS and ML; both of these methods give consistent estimators. MA and ARMA models cannot be estimated by OLS, so ML is the main choice; again, it is consistent. The other interesting property is efficiency, and here I am not completely sure (but clearly the information should be available somewhere as the question is pretty standard). I would try commenting on "correctness", but I am not sure what you mean by that.
Best Answer
The time resolution of measurement has to do with the data acquisition or sampling rate: how often are you taking a sample? If you're taking one sample every minute, but the causal influence is on the order of milliseconds, then you can expect the causal influence to occur within the same time chunk. On the other hand, if you're sampling every nanosecond and the causal influence occurs on a time scale of seconds, the causal influence will assuredly not be in the same time chunk.