I have dataset with clients orders. Example:
Customer_1 07.06.2017 Order_1 Product_1
Customer_1 15.06.2017 Order_2 Product_2
Customer_1 01.09.2017 Order_2 Product_1
Customer_2 07.05.2017 Order_3 Product_3
Customer_2 07.06.2017 Order_4 Product_2
Customer_2 25.09.2017 Order_5 Product_3
Customer_2 05.12.2017 Order_5 Product_1
....
Customer_N
How can I cluster these customers behavior? This dataset looks like time series. But It's difficult for me to find the right way for solving this problem. The history of each customer has different length. And I can't use simple clustering algorithms.
My major aim is to distinguish different customer behaviors, find persons who have started buy more frequently, who have changed their preferences in products (started buy other products), who have tried new for them products but back to previous products. How can I cluster patterns of behavior?
Best Answer
You data are timestamped event sequences. A solution to cluster your customers is to compute the pairwise dissimilarities between the sequences and then input the resulting matrix into any clustering procedure that works with such kind of input.
You can compute the pairwise dissimilarities with the optimal matching method for event sequences, OME, (see Ritschard et al., 2013) that is implemented in the
TraMineRextras
R package, a companion of theTraMineR
package.I illustrate below how you get the dissimilarity matrix for your two example sequences. We first need to create a
TraMineR
event sequence object. We need for that numeric ids and dates as integers. So we first make these transformations. Also, I useProduct
as the event and ignoreOrder
(which I do not understand what it is).Now computing the dissimilarities between sequences with OME
You can then cluster your sequences by inputting the
diss
matrix to a hierarchical clustering method (e.g. thehclust
function) or to a partitioning around medoids method (see e.g.WeightedCluster
package that is specifically designed for sequences). Note that you may have to inputdiss
as distance matrix objectas.dist(diss)
.