Solved – Bayesian network vs. association rules

aprioribayesian networkdata miningmachine learning

Apriori algorithm finds some implication rules.

Similar results are provided by Bayesian networks.

What is the essential difference? What are the specific advantages/disadvantages?

Edit:
The Apriori Algorithm generates assotiation rules as a kind of implication, as can be visually inspected on the following picture (taken from this paper).
enter image description here

Best Answer

This question is similar to the question: what's the difference between parametric vs. non-parametric models.

  • Bayesian network can be viewed as parametric model. Where we have explicit assumptions on the random variables, and dependencies among random variables (assuming we only do parameter learning no structure learning).

  • Apriori algorithm is type of "data mining" algorithm, which means it will give all the patterns with an effective algorithm, not really "machine learning", i.e., learn/tune certain parameters to optimize certain objective function.

  • Which is better? or pros and cons? just like the discussion about the parametric model vs. non-parametric model. If Bayesian network assumptions are good, then it will be "better". On the other hand, if the assumptions are not accurate, apriori may be better.


In addition, Bayesian network and Apriori algorithm are used differently.

  • Bayesian Networks are mainly used to "inference". Questions we can ask Bayesian Network is like "If I know A and B happened, what is the chance of C happen and D not happen"? The model will give probabilities of the query.

  • Apriori algorithm is used for getting frequent items set that satisfy the condition. The typical question asked would be "what are the frequent items come together", which is different from the conditional probability query mentioned in Bayesian Networks.

  • Informally speaking, we can think Apriori is trying to ask questions on joint probability and store all high frequency combinations. On the other hand, Bayesian network is trying to ask questions on conditional probability: given the data, which hypothesis is more likely.