Solved – Clustering of time series

classificationclusteringdata miningspsstime series

I have a set of almost 1600 time series on 2 years which I want to group into clusters. Do you think this is possible using k-means? Which method do you advice me to use? Is this possible at all using SPSS?

Best Answer

k-means cannot use arbitrary distance functions. It is designed for Euclidean distance.

Euclidean distance however does not work well for high-dimensional data such as your time series (unless you have a really low sampling rate, say 24 months)

For time series, you will probably want to use a time series distance. There are quire a lot designed specifically for different kinds of time series. You really should look at these.

They won't work with k-means, but there are various distance and density-based cluster algorithms (where usually density is defined by distance!) that you should try. However, I have no idea what SPSS supports. I don't know if it has any time series distances, either.