Solved – Association rules – support, confidence and lift

association-rulesdata mining

I am trying to mine association rules from my transaction dataset and I have questions regarding the support, confidence and lift of a rule.

Assume we have rule like {X} -> {Y}

I know that support is P(XY), confidence is P(XY)/P(X) and lift is P(XY)/P(X)P(Y), where the lift is a measurement of independence of X and Y (1 represents independent)

However, I just don't know how to interpret rules with these indicators. I have rules with high support, high confidence and low lift, is that a good rule ?

Since high confidence represents strong association and high support represents how convincing their association are. So high confidence + high support = good rule and we can ignore lift?

If I am going to order / rank my rules and pick, let say the best 10 to examine, which indicator should be chosen as the ranking variable?

Best Answer

It depends on your task. But usually you want all three to be high.

  • high support: should apply to a large amount of cases
  • high confidence: should be correct often
  • high lift: indicates it is not just a coincidence

Consider e.g. "rain" and "day". Assuming we live in a very unfortunate place at the Equator, where it is raining 50% of the time, and it is day 50% of the time, and these are independent of each other. I.e. in 25% of the time it is raining and it is day.

We then have a support of 25% - that is pretty high for most data sets. We also have a confidence of 50% - that is also pretty good. If 50% of my visitors buy a product I recommend I would be a billionaire. But the lift is just 1, i.e. no improvement.

Beware that on other data sets, you won't get anywhere near 25% support. Consider a supermarket with diverse prodcuts. How many % of customers do you think buy toilet paper?

Related Question