Market Basket Analysis – Interpretation of Mirrored Association Rules

association-measureassociation-rules

I have performed Market Basket Analysis on sales data, that comes from non-sequential receipts. I've read into the meaning of the different metrics, but there is one thing I don't understand. When looking at the strongest association rules, I detect a lot of mirrored rules. For example:

Example

I'm not sure how to interpret this. Looking at confidence, it seems that people are more likely to buy pie, and then buy a cup of coffee. But how is this determined? We're talking about the same products, and my assumption would be identical values for all metrics for both rules. How can it be calculated that pie is more likely to lead to coffee, and less so the other way around? The support and lift are equal and any kind of order is not maintained when analyzing the products on receipts.

Best Answer

The existing answer explains how the table is calculated. If you are still confused, one way to look at it is to start with the number of people who bought things.

Say 100 people visited the cafe, and 36 bought coffee, 18 bought pie, and 8 bought both. Then this is how the numbers in your table are calculated, using the formulas given by b-r-oleary:

P(A) P(C) P(A,C) P(C|A) P(A,C)/P(A)P(C) P(A,C)-P(A)P(C) (1-P(C))/(1-P(C|A))
36/100 18/100 8/100 8/36 100 x 8/(18x36) 8/100 - (18/100)(36/100) (1-18/100)/(1-8/36)
18/100 36/100 8/100 8/18 100 x 8/(18x36) 8/100 - (36/100)(18/100) (1-36/100)/(1-8/18)

Out of 18 people who bought pie, 8 also bought coffee, so the confidence is 8/18. But out of 36 people who bought coffee, only 8 also bought pie, so the confidence is 8/36.

The numbers in bold are the ones which aren't necessarily equal. This is just a consequence of how they are defined. The names "support", "lift" etc. are just names, which hopefully hint at how the numbers should be interpreted.

Related Question