Is there a reason why there seem to be many implemenations of a moving
average but relatively few implementations of what wikipedia defines
to be a moving average model?
A definition from wikipedia page:
The moving-average model specifies that the output variable depends
linearly on the current and various past values of a stochastic
(imperfectly predictable) term.
And further
Thus, a moving-average model is conceptually a linear regression of
the current value of the series against current and previous
(observed) white noise error terms or random shocks. The random shocks
at each point are assumed to be mutually independent and to come from
the same distribution, typically a normal distribution, with location
at zero and constant scale.
Rephrasing this definition, the $MA(q)$ timeseries model means that the value $X_{t}$ of random variable $X$ is a linear combination of one or more stochastic values lagged at times $0:\inf$ (but in practice the maximum lag is rarely more than 2). The average of $X$ can be added to the model if it is significantly different from zero.
In the context of fitting ARIMA models, the $MA(q)$ part means that the residuals of the $AR(p)$ model of the process are added to the model estimation if their presence significantly decreases the residual sum of squares.
If the $AR$ part is not present, the process is assumed to be stationary (and usually normally distributed) with zero mean and constant variance.
The $q$ term in the formula means taking a weighted sum of residuals from previous steps. Suppose you want to get $X_{t = 1}$:
$\sum^{q}_{j = 0}{ ϵ_{t - j} * θ_{j}}$
where $θ$ are estimates of the model's coefficients.
This approach helps when your timeseries $X$ is stationary (you can check out what that means) and thus fluctuates around it's average, while the residuals of $X$ from its average are correlated with $X_{t}$.
Practical example using R
x <- rnorm(100)
acf(x)
arrr <- arima(x, order = c(0L, 0L, 0L))
Note that x
is independent and identically distributed which makes it in particular a stationary process.
So we don't need the $q$ terms at all since the process does not autocorrelate.
x2 <- arima.sim(list(ma = 0.5), 100)
acf(x2)
arrr2 <- arima(x2, order = c(0L, 0L, 1L))
arrr2
I simulated MA(1) process. You can check out the output of arrr2
to find that ma1 s.e.
is very low compared to an estimate.
The fit of the model order can be done by manual exploration of possible models, or, for example, by forecast::auto.arima
function.
I would also use auto.arima
. The fact the the selected models change frequently between one window to the next may be due not only to frequent structural changes (which is probably unlikely) but to the fact that there are several models that approximate the patterns in the data about equally well, so their AICs are very close. Then changing two data points out of 1000 (dropping the oldest point and adding one new point) can make auto.arima
switch between these competing models. I would not worry too much about that, as each of these models likely implies very similar time series patterns. They are probably almost equivalent representations of the same thing. (Such a hypothesis could also be assessed by looking at the different models' impulse-response functions or the implied ACFs.)
If the AICs of a few best models are very close, then it should also not matter much which one of these you choose. As far as we know, they are all almost equally good approximations of reality. So you could just pick the one you like and stick to it. That would make the results look much cleaner than having a constantly changing model. For that, consider obtaining not only the best model from auto.arima
but, say, the top 5 together with their AICs. Do that in each window and see how different the AICs are. If they are very close, you could just pick one of these models and use it in all the windows.
Or you could decide that you will only change the model from one window to the next if the difference in AICs between the model from the past window and the best one in the current window is sufficiently large. This should give you more stability from window to window and should not be difficult to program.
Best Answer
A weighted average is what an ARIMA model is Seeking certain type of ARIMA explanation . It is the answer to ...the double question ...1) how many values should I include AND 2) how do I weight/leverage them in order to get a "representative value".
Thus it is BOTH a smoother and a forecaster ..
When you specify a 3 period equally-weighted average for either smoothing or forecasting ...you are specifying an arima model (3,0,0)(0,0,0) with coefficients = 1/3 , 1/3 and 1/3 WITHOUT a constant. Obviously one might specify a 3 period weighted aveage where the weights are optimized ... as is often the case.
What you may be looking for is a way for your software to determine both "how many" and "how important" the previous values are and not just assume some form of a model like ets.
Also see Identifying Early Indicators Time Series Analysis for an example of a 3 period autoregressive ( really 3 period moving average in common parlance ) in practice.
One final distinction is that smoothing "centers" the result by using the value 1 period before , the current value and the value 1 period in the future period after the current period whereas forecasting uses the value 3 periods before , 2 periods before and 1 period before to predict the next value.