Solved – a correct implementation of the moving average model

forecastingmoving averagemoving-average-modeltime series

I would like to implement a moving average model in python as when I try to use the statsmodels library, specifically the ARMA(p,q) function and setting $p=0$ I get a lot of convergence errors in the underlying libraries and I don't really understand why a moving average model would cause such problems, so I would like to implement it myself.

From wikipedia a moving average model is defined as:

$$X_t=\mu+\epsilon_t+\theta_1\epsilon_{t-1}+\cdots+\theta_q\epsilon_{t-q},$$

where $\mu$ is the mean of the series, the $\theta_1,\cdots,\theta_q$ are the parameters and the $\epsilon_t,\cdots,\epsilon_{t-q}$ are white noise error terms. The value of $q$ is the order of the MA model. The article even makes a point of stating that a moving average model should not be confused with the moving average.

In looking for resources on how to implement this in python I came across many articles and blog posts which simply implement what I understand to be a moving average whereby a sliding window of size $n$ is defined and the value at the next step is simply the mean of samples within that sliding window.

If I wish to produce forecasts for time series data, are both techniques valid and inherently different or is a moving average model as defined here different from that of a moving average, which many people and resources seem to implement and call a moving average model, despite the defintion and their implementation being different. Is there a reason why there seem to be many implemenations of a moving average but relatively few implementations of what wikipedia defines to be a moving average model?

Best Answer

Is there a reason why there seem to be many implemenations of a moving average but relatively few implementations of what wikipedia defines to be a moving average model?

A definition from wikipedia page:

The moving-average model specifies that the output variable depends linearly on the current and various past values of a stochastic (imperfectly predictable) term.

And further

Thus, a moving-average model is conceptually a linear regression of the current value of the series against current and previous (observed) white noise error terms or random shocks. The random shocks at each point are assumed to be mutually independent and to come from the same distribution, typically a normal distribution, with location at zero and constant scale.

Rephrasing this definition, the $MA(q)$ timeseries model means that the value $X_{t}$ of random variable $X$ is a linear combination of one or more stochastic values lagged at times $0:\inf$ (but in practice the maximum lag is rarely more than 2). The average of $X$ can be added to the model if it is significantly different from zero.

In the context of fitting ARIMA models, the $MA(q)$ part means that the residuals of the $AR(p)$ model of the process are added to the model estimation if their presence significantly decreases the residual sum of squares.

If the $AR$ part is not present, the process is assumed to be stationary (and usually normally distributed) with zero mean and constant variance.

The $q$ term in the formula means taking a weighted sum of residuals from previous steps. Suppose you want to get $X_{t = 1}$:

$\sum^{q}_{j = 0}{ ϵ_{t - j} * θ_{j}}$

where $θ$ are estimates of the model's coefficients.

This approach helps when your timeseries $X$ is stationary (you can check out what that means) and thus fluctuates around it's average, while the residuals of $X$ from its average are correlated with $X_{t}$.

Practical example using R

x <- rnorm(100)

acf(x)

arrr <- arima(x, order = c(0L, 0L, 0L))

Note that x is independent and identically distributed which makes it in particular a stationary process.

So we don't need the $q$ terms at all since the process does not autocorrelate.

x2 <- arima.sim(list(ma = 0.5), 100)

acf(x2)

arrr2 <- arima(x2, order = c(0L, 0L, 1L))

arrr2

I simulated MA(1) process. You can check out the output of arrr2 to find that ma1 s.e. is very low compared to an estimate.

The fit of the model order can be done by manual exploration of possible models, or, for example, by forecast::auto.arima function.

Related Question