Solved – How to cluster non-aligned time-series with different length

clusteringtime series

I am trying to cluster dozens of time-series sampled every 30min, and which cover the period mid2016 – mid2020. Most of them have very nice "patterns", others may have missing values for a given period (eg: one whole year, severals months, etc) or be more "chaotic" (sudden variations).

Here I display some of the time-series I am handling:
enter image description here

enter image description here

enter image description here

enter image description here

If we look at a closer level (eg: weekly), it is possible to see some seasonal patterns as the graphs below show (2020/1/1 to 2020/1/8):

enter image description here

enter image description here

enter image description here

enter image description here

Ideally, I would like to make clusters where time-series share similar "shapes in time" (eg: similar shape based on time –> peaks on the morning and evening, almost null values on weekends or holidays, etc) but also, if possible, yearly seasonality when enough data are available.

I tried to apply the commonly used DTW measure + hierarchical clustering (ward linkage), but because of the number of points I have per time-series (even after doing 1hr resampling), it took too much time and I was quite disappointed with the results (though I applied on data with few amount of preprocessing).

So what I am facing is:

  • I would like to extract the "nicest" part of each time series, but if I do so, they will be misaligned (do not start at the same time point) and they will be of different length.
    Thus, I am quite confused to the preprocessing steps I should employ.

I would be glad if you have some advice about preprocessing / distance / clustering algorithm that I should I apply to perform clustering of these time series.

Best Answer

Preprocessing that results in misalignment or different lengths is not necessarily a problem. Have you considered Time Series Clustering - a decade review (Information Systems 53, 2015), in which Aghabozorgi et al review 38 algorithms for clustering whole time-series? See the rightmost column of Table 4 (pages 27-28) for their notes describing attributes of each of the approaches.