In a tutorial I came across this:
"Recall that the forecast value is: $\hat{y}_{t+1} = \frac{y_t + y_{t-1} + … + y_{t-m+1}}{m}$
It's worth pondering that formula for a minute. While easy to understand, one of its properties may not be obvious. What's the lag associated with this technique? Think it through. The answer is $\frac{(m+1)}{2}$ For example, say you're averaging the past 5 values to make the next prediction. Then local changes will yield a lag of $\frac{5+1}{2} = 3$ periods. Clearly, the lag increases as you increase the window size for averaging."
Before going through this, I thought I have the correct intuition of a moving average model. If I choose a window of m, then the prediction y(t+1) is based on m previous values and thus the lag is m. How does this tutorial come up with this formula for lag? As an example, how does a window size of m in a moving average smoothing model, correspond to a lag of 3?
Best Answer
Wikipedia has good commentary on the interpretation of a moving average (MA) model, to quote:
In essence, it is about the observed mechanics of how a random shock is propagated across time. Relatedly, drop a stone in a pool of water and observed the movement and change in the generated shock wave as a function of time. If you alter the medium (for example, use molasses), it changes (truncates) the wave propagation.
[EDIT] Per a comment below, my understanding is that a MA smoothing formula is a mechanically applied naive rendition of a possible more general MA time series model. It is often used to display a smoother graph of random data for which, with a longer time series, perhaps a more precise MA time series model may actually be indicated. MA smoothing is a simple convenient tool and should not be viewed as mathematically precise with a deeper meaning, in my opinion. See Wikipedia comments, which is in agreement with my general sentiments.