Solved – Holt-Winters Forecasting – Why do we use most recent estimate for all projections going forward

exponential-smoothingforecastingtime series

I've been doing some research on using the Holt-Winters method for forecasting and understand all but one aspect.

Why do we use the most recent estimate for the base and trend components for all projections going forward? Does the data lose some integrity staying locked on only one week, and only using seasonality to account for change?

I've googled relentlessly but cant find the rationale.
I want to be able to explain this if questions arise, so any insight would be great!

This video at 10:00 is what my question is geared at…
https://youtu.be/qpiWJaeJPtA?t=10m

Excel Snapshot

Best Answer

I assume we're not dealing with the multiplicative form.

The reason we use the most recent estimates for the level and trend is because of the way the model is set up -- in effect, it corresponds to an assumption that's part of the model.

It's easiest to see if you look at the model in error-correction form (see, for example, Sec 7.5 of Hyndman & Athanasopoulos, Forecasting Principles and Practice):

\begin{align*} \ell_{t} &= \ell_{t-1} + b_{t-1}+\alpha e_{t}\\ b_{t} &= b_{t-1} + \alpha \beta^*e_{t}\\ s_{t} &= s_{t-m}+\gamma e_t. \end{align*} where $e_t=y_{t} - (\ell_{t-1} + b_{t-1}+s_{t-m})=y_{t} - \hat{y}_{t|t-1}$

The components there being what I'll call current level (where we'd expect the seasonally-corrected series to be), slope (the expected change in that level from time period to time period) and seasonal.

(It sounds like your source is working in terms of a baseline level rather than the current level, but you should be able to translate the intuition from this formulation back to that one.)

Starting with the line for $b$, imagine we're sitting at time $t-1$, trying to forecast $t$. The model for how the slope ($b$) (rate of change of current level) works tells us that the next $b$ is the current $b$ plus a "jump". The slope term moves as a random walk (making the trend more or less linear in the short term, at least if the innovation variance is small, but not in the long term -- we might call that locally linear).

Note that the future forecast error term $e_t$ is unknown, but is $0$ on average (if the model is correct).

So if our estimates are unbiased, the expected value for $b_t$ is our estimate of $b_{t-1}$.

Similarly, if we're sitting at time $t-1$, our most recent estimate for the current level is $l_{t-1}$. The next one will be the previous one, plus the slope, plus a "jump", (a multiple of the same innovation as moved the trend).

If we remove the effect of the trend over time, to get level in terms of a "baseline" level, by defining baseline level as $d_t=\ell_t-\sum^t b_t$ then the expected baseline level is again the previous one - $E(d_t)=d_{t-1}$.

The expected seasonal component moves in a similar way (but 'seasonally') - it's value is the value from the previous season-cycle plus a "jump".

So it's really just down to the model assumption of Holt-Winters. They're the way they are because the model actually assumes the next one is the relevant 'previous' one plus a zero-mean innovation.

Does the data lose some integrity staying locked on only one week and only using seasonality to account for change?

i) The model has a component to account for the changes over time; those jumps I keep mentioning. They don't affect the expected value until they happen (the model says we don't know which direction those jumps will go, only something about their typical size), but going out into the future they do affect its uncertainty.

ii) as with any model, if the model is badly wrong, the forecasts can't be expected to be right -- but if this model is a reasonable description of the data (and for many series it performs pretty well), then those forecasts are "correct" for that model and so should also work well. No model is likely to be actually "right", but the components of this model are close to how we understand a number of processes to work.

So if in the process for your data the direction of the changes in level and trend could be determined (at least above 'it's completely random'), it wouldn't make sense to use a model that assumed they couldn't.