Solved – Clustering time series when each object has multiple time series

clusteringtime series

I'm trying to cluster my customers in terms of their buying pattern throughout 7 years. Each customer has the quantity they bought at each quarter for 7 years for 10 products. So there are multiple time series for each customer. I would like to cluster customers based on their buying pattern (I also want to make sure that the algorithm also considers the correlation between one's buying patterns of different product. Which clustering method should I use? Is it possible to use SAS, SPSS or R to compute this?

Thanks!

Best Answer

There is no algorithm which just clusters groups of multidimensional time series for you.

First you need to define features for each datapoint (here: individual), then you can choose a clustering method in the feature space. These are two seperate tasks.

For now, you have for each indiviual customer of 7 * 10 * 4 scalars, which you can define as 10 time series (for each product) each of length 28 (4 quartals over 7 years).

simple start: mean interest in a product over the years

You could now compute the mean for each products over time and remain with 10 values per customer, which represent the average interest of a customer in a specific product over time. Now you could use any clustering algorithm (see here for examples) and cluster customers regarding their interest.

correlation

Yet you specifically mentioned correlation between products as relevant. Well, then you can compute the correlation matrix between all 10 time series which yields a 10 x 10 matrix with a diagonal of 1s and symetric entries. Therefore you are left with 45 features per customer Clustering in a high dimensional space is problematic, even more so if you do not have a lot of data. See here for good explanations. You could potentially further compute features based on the correlation matrix.

The quality of your clustering depends on the fit between your actual interest in certain customer attributes and the features you use for clustering.

tools

It is definetly possible to do this in R. Probably also in SPSS. I dont know SAS. I recommend Python.