Solved – Market Basket Analysis using Clustering to discover *new* product combinations

aprioriassociation-rulesclusteringk-means

I have transaction data from a Quick Service Restaurant (QSR) client. Each record in this data set represents a transaction. My objective is to discover products that are the best candidates to be combined together and offered to customers.

Traditionally, I would have performed a Market Basket Analysis on this data, using metrics like ‘confidence, lift, and support, to reveal items that are most frequently bought together. This is commonly known as Association Rules or Affinity analysis. [See the apriori algorithm that utilizes these principles.]

But, here’s the rub: the QSR already offers combo meals to their customers. Due to this, the analysis “discovers” product groups that were historically offered together in combo meals. The analysis reveals nothing that is novel, i.e., products that were offered together were also most-frequently purchased together. This is expected because the way MBA is done traditionally is an empirical method – you will find products that were purchased together in the historical data; there’s no “projection” (so to speak) in this method.

So now I am considering an alternative approach: use cluster analysis (e.g., k-means) to identify products that cluster together. In order to prepare the data for this analysis, I can create a dummy indicator (binary flag) for each item that exist in the data set. The cluster analysis would be performed based on (all or some of) those dummy indicators to group transactions. Let’s say the cluster analysis yields five clusters. I would then look into each cluster and identify variables (dummy indicators) that are predominant in each cluster. One way to do this would be to look the ratio of between-cluster variance to within-cluster variance ($R^2$ to $(1-R^2)$) for each variable.

I briefly tested this approach, and it looks like this method is be able to look beyond what was offered together historically, and yield new product-bundle ideas (based on items that cluster together).

My questions are as follows: Does this approach make sense? Are there any references (books/papers) about this method of doing market basket analysis (aka product bundle analysis) using clustering?

Update: Please note that I am not asking how to solve the above stated problem. I have a specific solution in mind, and I am asking whether this approach has any theoretical, practical, empirical, or intuitive backing.

PS: There's a related question about this (link), but the answer — that you can use frequent itemset mining — is hardly satisfying, because this just appears to be yet another name for association rules mining. All of the following approaches/techniques would yield product that were purchased together in the past, rather than products that might not have been bought together in the past but are good candidates for bundling: association rules, affinity analysis, apriori algorithm, (and more generally) market basket analysis, and product bundle analysis.

Best Answer

k-means or clustering won't get you anywhere.

Frequent itemset mining is most appropriate for this data type.

Yes, it will discover combos you have been offering before. But the solution is simple: clean your data.

Option 1) remove known combos

Option 2) treat known combos as a single item (i.e. customer hought combo-1, not burger and fries separately)

Option 3) ignore frequent patterns / association rules that you already use(d).

The ability to discover the combos that you had just demonstrates that it worked! Did you get anything remotely useful from k-means?!?