Solved – the lag associated with Moving Average smoothing

exponential-smoothingmoving averagesmoothingtime series

In a tutorial I came across this:

"Recall that the forecast value is: $\hat{y}_{t+1} = \frac{y_t + y_{t-1} + … + y_{t-m+1}}{m}$

It's worth pondering that formula for a minute. While easy to understand, one of its properties may not be obvious. What's the lag associated with this technique? Think it through. The answer is $\frac{(m+1)}{2}$ For example, say you're averaging the past 5 values to make the next prediction. Then local changes will yield a lag of $\frac{5+1}{2} = 3$ periods. Clearly, the lag increases as you increase the window size for averaging."

Before going through this, I thought I have the correct intuition of a moving average model. If I choose a window of m, then the prediction y(t+1) is based on m previous values and thus the lag is m. How does this tutorial come up with this formula for lag? As an example, how does a window size of m in a moving average smoothing model, correspond to a lag of 3?

Best Answer

Wikipedia has good commentary on the interpretation of a moving average (MA) model, to quote:

The moving-average model is essentially a finite impulse response filter applied to white noise, with some additional interpretation placed on it. The role of the random shocks in the MA model differs from their role in the autoregressive (AR) model in two ways. First, they are propagated to future values of the time series directly: for example, ${\varepsilon _{t-1}}$ appears directly on the right side of the equation for ${X_{t}}$. In contrast, in an AR model ${\varepsilon _{t-1}}$ does not appear on the right side of the ${ X_{t}}$ equation, but it does appear on the right side of the ${X_{t-1}}$ equation, and ${ X_{t-1}}$ appears on the right side of the ${X_{t}}$ equation, giving only an indirect effect of ${\varepsilon _{t-1}}$ on ${X_{t}}$. Second, in the MA model a shock affects ${X}$ values only for the current period and q periods into the future; in contrast, in the AR model a shock affects ${X}$ values infinitely far into the future, because ${\varepsilon_{t}}$ affects ${X_{t}}$, which affects ${X_{t+1}}$, which affects ${X_{t+2}}$, and so on forever (see Vector autoregression#Impulse response).

In essence, it is about the observed mechanics of how a random shock is propagated across time. Relatedly, drop a stone in a pool of water and observed the movement and change in the generated shock wave as a function of time. If you alter the medium (for example, use molasses), it changes (truncates) the wave propagation.

[EDIT] Per a comment below, my understanding is that a MA smoothing formula is a mechanically applied naive rendition of a possible more general MA time series model. It is often used to display a smoother graph of random data for which, with a longer time series, perhaps a more precise MA time series model may actually be indicated. MA smoothing is a simple convenient tool and should not be viewed as mathematically precise with a deeper meaning, in my opinion. See Wikipedia comments, which is in agreement with my general sentiments.

Related Question