Solved – How to decide which decision tree classifier to use

cartrule-of-thumbweka

I am confused about which decision tree algorithm in weka to use for my application. I have 5 real input variables and 2 classes. In various online tutorials J48 (C 4.5) seems to be the algorithm of choice. Are there any rules of thumb / tips / tricks to decide which tree algorithm should be used ?

Weka decision tree choices

Best Answer

Start with J4.8 since it is fastest to train and generally gives good results. Also its output is human readable therefore you can see if it makes sense. It has tree visualizers to aid understanding. It is among most used data mining algorithms.

If J4.8 does not give you good enough solutions , try other algorithms.

Random forests may give you better solution but it is not human readable and it is not as fast as J4.8 (due to training multiple trees in process).

I recommend following strategy to you if you want to learn how tree algorithms work better.

  1. Read about J4.8 and how it is trained. Most tree algorithms use variation of CART, ID3, C4.5, C5.0. They are very similar conceptually.
  2. After that read about boosting and ensemble methods.
  3. Read about random forests. They use ideas from above methods.
  4. Read about other algorithms after these ones. For example NBTree uses naive bayes at the leaves. LMT uses "Classifier for building 'logistic model trees', which are classification trees with logistic regression functions at the leaves."

There are also some issues to consider choosing algorithms.

  • I found that some algorithms are more memory hungry than others. I worked with 4.8 Million instances database, KDD99. I could train j4.8 with 4GB ram but not random forests for that matter a lot of other algorithms, ( Neural Networks , SVM etc)

  • Some other tree classifiers may not deal with your attributes or your class size. for example ADTree only support two class problems. Some algorithms may not support date attributes etc.

Related Question