Solved – Measuring Strength of Trend and Seasonalities for Time-Series presenting Multi-Seasonal Patterns

multiple-seasonalitiespythonstatsmodelstime seriestrend

My goal is to cluster time-series which may present daily/weekly/yearly seasonalities. To do so, I plan to use different variables and among them, the following ones: $F_{s_{daily}}$, $F_{s_{weekly}}$ and $F_{s_{yearly}}$ should measure strength of daily, weekly and yearly seasonalities, and $F_{t}$ should measure the strength of the trend.

To quantify strength, I am using the definition provided by Hyndman in his book Forecasting: Principles and Practice (ref: https://otexts.com/fpp2/seasonal-strength.html):

Strength of trend:
$F_t=max(0,1-\frac{Var(R_t)}{Var(T_t+R_t)})$ where $R_t$ is the time-series' remainder component and $T_t$ its trend component (the closer to 1, the higher the strength of trend).

Strength of seasonality:
$F_t=max(0,1-\frac{Var(R_t)}{Var(S_t+R_t)})$ where $R_t$ is the time-series' remainder component and $S_t$ its seasonal component (the closer to 1, the higher the strength of seasonality).

To test these coefficients in the case of a multi-seasonal time-series, I made an artificial time-series with 3 seasonal components and no trend:

$y(t)=y_{daily}(t)+y_{weekly}(t)+y_{yearly}(t)$

where

$y_{daily}(t)=cos(\frac{2\pi*t}{24})$,
$y_{weekly}(t)=cos(\frac{2\pi*t}{24*7})$,
$y_{yearly}(t)=cos(\frac{2\pi*t}{24*365})$

See this artificial data set as a two-year time-series sampled every hour (thus 24 points / hour).

To decompose the time-series into its different seasonal components using an additive model, I used the Python's seasonal_decompose function from statsmodel library.

After doing daily decomposition (ie. period=24) of y(t), I found $F_{t_{daily}} = 0.999$ and $F_{s_{daily}} = 0.999$

After doing weekly decomposition (ie. period=24*7) of y(t), I found $F_{t_{weekly}} = 1$ and $F_{s_{weekly}} = 1$

After doing yearly decomposition (ie. period=24*365) of y(t), I found $F_{t_{yearly}} = 0$ and $F_{s_{yearly}} = 0.763$

But if I firstly remove two of the three seasonal components, and compute $F_{s}$ and $F_{t}$ for the remaining one, I got different results:

For the daily case ($y(t) – y(t).seasonal_{weekly} – y(t).seasonal_{yearly}$), $F_{t_{daily}} = 0$ and $F_{s_{daily}} = 0.59$

For the weekly case ($y(t) – y(t).seasonal_{daily} – y(t).seasonal_{yearly}$), $F_{t_{weekly}} = 0$ and $F_{s_{weekly}} = 1$

For the yearly case ($y(t) – y(t).seasonal_{daily} – y(t).seasonal_{weekly}$), $F_{t_{yearly}} = 0$ and $F_{s_{yearly}} = 1$

I am confused as the results I got are not the ones I would have expected (expected: $F_{s_{yearly}}$, $F_{s_{yearly}}$ and $F_{s_{yearly}}$ close to 1, and $F_{t}$ close to 0)
Can tell me what is wrong and need to be changed in my approach, and any recommendation to measure $F_{t}$. Thank you

Best Answer

I think the problem lies with the choice to use seasonal_decompose from statsmodels. It is a very naive implementation (it even says so in their documentation!) The trend component is a moving average which typically eats a lot of extra signal up and it doesn't handle multiple seasonal signals. I would look into something that handles multiple seasonalities naturally like fbProphet or some other GAM setup.

For general purpose time series clustering I probably wouldn't reinvent the wheel, there are time series feature extraction libraries out there (like tsfresh for python) and a lot come with clustering as an additional feature.