Solved – Difference in rules via apriori() with target “frequent itemsets” and ruleInduction() and via apriori() with target “rules”

association-rulesr

Regarding R package arules:

To my understanding the Apriori algorithm works by first finding all frequent itemsets that meet the support threshold and then generate strong association rules from the frequent itemset that also meet minimum confidence.

Hence I would expect that

txs <- as(inputDataTable,"transactions")
itemsets <- apriori(txs, parameter = list(support = 0.05, confidence = 0.7, target="frequent itemsets"))
rules <- ruleInduction(itemsets)

and

txs <- as(inputDataTable,"transactions")
rules <- apriori(txs, parameter = list(support = 0.05, confidence = 0.7, target="rules"))

would lead to the same rules, however more rules are found in the second example and I can't understand why.

Can anybody explain why this is? I'm trying to get my head around it all morning..

Of course I'm happy to provide my specific data, but I reckoned that this isn't necessary since it is a generic question.

Best Answer

Ok.. pretty straightforward now I know what was the problem.

For anybody who encounters a similar problem. The problem was that confidence should (of course) be set at the ruleInduction() step and not when finding all itemsets. Only support is relevant then. Because I didn't give a value for confidence at the ruleInduction() step, the default value for confidence of 0.8 was used and thus less rules were found.

So doing:

txs <- as(inputDataTable,"transactions") itemsets <- apriori(txs, parameter = list(support = 0.05, target="frequent itemsets")) rules <- ruleInduction(itemsets, confidence = 0.7)

and

txs <- as(inputDataTable,"transactions") rules <- apriori(txs, parameter = list(support = 0.05, confidence = 0.7, target="rules"))

does lead to the same result. :)