Solved – Evaluating Association Rules Using Kulczynski and Imbalance Ratio

association-measureassociation-rules

I have a dataset containing information about movies and their genres.

From the dataset I have generated association rules from the frequent itemsets that I have mined using the Apriori algorithm.

From that I have found some interesting association rules and now I want to evaluate how useful they are.

As an example, I have found the following rules:

  • Rule A: Romance, War -> Drama (support: 0.006, confidence: 0.863)
  • Rule B: Drama -> Romance, War (support: 0.006, confidence: 0.012)

From this I calculate the Kulczynski measure to be 0.4375.

Furthermore, using the following itemsets I can calculate the IR:

  • Itemset A: Romance, War (support: 0.007)
  • Itemset B: Drama (support 0.489)
  • Itemset A⋃B = Drama, Romance, War (support: 0.006)

IR(A,B) = 0.9836

All in all this shows that the data is heavily skewed (which is to be expected, since Drama is a very common genre compared to Romance and War) and that the itemsets are neutral or maybe slightly negatively associated.

The question then comes down to: What does this tell me about the "value" of the rules? How does the two measures go hand in hand to evaluate the rules?

Best Answer

The two are combined to help find "interesting" rules. As you know,

$$ \newcommand{\Kulczynski}{{\rm Kulczynski}} \newcommand{\support}{{\rm support}} \Kulczynski = \frac{1}{2}\big(P(A|B) + P(B|A)\big) $$

If Kulczynski is near 0 or 1, then we have an interesting rule that is negatively or positively associated respectively. If Kulczynski is near 0.5, then we may or may not have an interesting rule. We can have

$$ \Kulczynski = \frac{1}{2}\big(0.5 + 0.5\big) = 0.5 $$

Also, as in your case, we might also have

$$ \Kulczynski = \frac{1}{2}\big(0.863 + 0.012) = 0.4375 $$

While some people might consider these both uninteresting, others might want to know about this. To differentiate between the two situations, we can look at Imbalance Ratio where 0 is perfectly balanced and 1 is very skewed.

$$ IR = \frac{\big|\support(A) - \support(B)\big|}{\support(A) + \support(B) - \support(A \cup B)} $$

So completely uninteresting rules would have both $\Kulczynski=0.5$ and $IR=0$.