MA Model Estimation:
Let us assume a series with 100 time points, and say this is characterized by MA(1) model with no intercept. Then the model is given by
$$y_t=\varepsilon_t-\theta\varepsilon_{t-1},\quad t=1,2,\cdots,100\quad (1)$$
The error term here is not observed. So to obtain this, Box et al. Time Series Analysis: Forecasting and Control (3rd Edition), page 228, suggest that the error term is computed recursively by,
$$\varepsilon_t=y_t+\theta\varepsilon_{t-1}$$
So the error term for $t=1$ is,
$$\varepsilon_{1}=y_{1}+\theta\varepsilon_{0}$$
Now we cannot compute this without knowing the value of $\theta$. So to obtain this, we need to compute the Initial or Preliminary estimate of the model, refer to Box et al. of the said book, Section 6.3.2 page 202 state that,
It has been shown that the first $q$ autocorrelations of MA($q$) process
are nonzero and can be written in terms of the parameters of the model
as
$$\rho_k=\displaystyle\frac{-\theta_{k}+\theta_1\theta_{k+1}+\theta_2\theta_{k+2}+\cdots+\theta_{q-k}\theta_q}{1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2}\quad k=1,2,\cdots, q$$ The expression above for$\rho_1,\rho_2\cdots,\rho_q$
in terms $\theta_1,\theta_2,\cdots,\theta_q$, supplies $q$ equations
in $q$ unknowns. Preliminary estimates of the $\theta$s can be
obtained by substituting estimates $r_k$ for $\rho_k$ in above
equation
Note that $r_k$ is the estimated autocorrelation. There are more discussion in Section 6.3 - Initial Estimates for the Parameters, please read on that. Now, assuming we obtain the initial estimate $\theta=0.5$. Then,
$$\varepsilon_{1}=y_{1}+0.5\varepsilon_{0}$$
Now, another problem is we don't have value for $\varepsilon_0$ because $t$ starts at 1, and so we cannot compute $\varepsilon_1$. Luckily, there are two methods two obtain this,
- Conditional Likelihood
- Unconditional Likelihood
According to Box et al. Section 7.1.3 page 227, the values of $\varepsilon_0$ can be substituted to zero as an approximation if $n$ is moderate or large, this method is Conditional Likelihood. Otherwise, Unconditional Likelihood is used, wherein the value of $\varepsilon_0$ is obtain by back-forecasting, Box et al. recommend this method. Read more about back-forecasting at Section 7.1.4 page 231.
After obtaining the initial estimates and value of $\varepsilon_0$, then finally we can proceed with the recursive calculation of the error term. Then the final stage is to estimate the parameter of the model $(1)$, remember this is not the preliminary estimate anymore.
In estimating the parameter $\theta$, I use Nonlinear Estimation procedure, particularly the Levenberg-Marquardt algorithm, since MA models are nonlinear on its parameter.
Overall, I would highly recommend you to read Box et al. Time Series Analysis: Forecasting and Control (3rd Edition).
Best Answer
I really think this is a good question and deserves an answer. The link provided is written by a psychologist who is claiming that some home-brew method is a better way of doing time series analysis than Box-Jenkins. I hope that my attempt at an answer will encourage others, who are more knowledgeable about time series, to contribute.
From his introduction, it looks like Darlington is championing the approach of just fitting an AR model by least-squares. That is, if you want to fit the model $$z_t = \alpha_1 z_{t-1} + \cdots + \alpha_k z_{t-k} + \varepsilon_t$$ to the time series $z_t$, you can just regress the series $z_t$ on the series with lag $1$, lag $2$, and so on up to lag $k$, using an ordinary multiple regression. This is certainly allowed; in R, it's even an option in the
ar
function. I tested it out, and it tends to give similar answers to the default method for fitting an AR model in R.He also advocates regressing $z_t$ on things like $t$ or powers of $t$ to find trends. Again, this is absolutely fine. Lots of time series books discuss this, for example Shumway-Stoffer and Cowpertwait-Metcalfe. Typically, a time series analysis might proceed along the following lines: you find a trend, remove it, then fit a model to the residuals.
But it seems like he is also advocating over-fitting and then using the reduction in the mean-squared error between the fitted series and the data as evidence that his method is better. For example:
This is not a good idea because the test of a model is supposed to be how well it can forecast, not how well it fits the existing data. In his three examples, he uses "adjusted root mean-squared error" as his criterion for the quality of the fit. Of course, over-fitting a model is going to make an in-sample estimate of error smaller, so his claim that his models are "better" because they have smaller RMSE is wrong.
In a nutshell, since he is using the wrong criterion for assessing how good a model is, he reaches the wrong conclusions about regression vs. ARIMA. I'd wager that, if he had tested the predictive ability of the models instead, ARIMA would have come out on top. Perhaps someone can try it if they have access to the books he mentions here.
[Supplemental: for more on the regression idea, you might want to check out older time series books which were written before ARIMA became the most popular. For example, Kendall, Time-Series, 1973, Chapter 11 has a whole chapter on this method and comparisons to ARIMA.]