How is the decay rate in exponential smoothing optimized

exponential-smoothingmseoptimizationtime series

For the sake of simplicity, I just want to focus on single/level exponential smoothing. When alpha, the decay rate, is near 1, the most recent observation has the highest weight and influence of recent observations decay rapidly, lending to a high variance model. I'm curious how alpha can be optimized, returning a smooth approximation that reasonably closely follows the observed data.

I can see two perspectives:

What seems to be working for me is minimizing the sum of mean squared error and mean volatility (mean change between observations.) Though, I'm not sure that this is principled.

Regardless, how is alpha optimized in practice?

Best Answer

If your previous smoothed number was $y_{n-1}$ and your new observation is $x_n$

then the new smoothed number $y_n$ which minimises $\alpha(x_n-y_n)^2+(1-\alpha)(y_n-y_{n-1})^2$

is $y_n = \alpha x_n + (1-\alpha)y_{n-1}$

which is essentially exponential smoothing.

In your particular suggestion of equal weighting for $(x_n-y_n)^2$ and for $(y_n-y_{n-1})^2$ in your minimisation, you get $\alpha=\frac12$. You can make $\alpha$ larger if you want to be closer to the new number and smaller if you prefer to be closer to the previous smoothed number, but you should always have $0 \lt \alpha \lt 1$. Choosing $\alpha$ should reflect your preference for the balance between these two; it may be affected by your perception of the data and you might choose a higher $\alpha$ when you think the trend is stronger than the noise, and a smaller $\alpha$ when you think the noise is excessive.

Related Question