I have a J48 decision tree model trained with WEKA. I would like to access the rules of the tree in J48 so that I can somehow use them in my code whether with if-else statements or as a decision table I can access in my code. Is this possible? If yes, how?
Solved – Converting J48 to if-then rules in Weka
cartmachine learningweka
Related Solutions
Start with J4.8 since it is fastest to train and generally gives good results. Also its output is human readable therefore you can see if it makes sense. It has tree visualizers to aid understanding. It is among most used data mining algorithms.
If J4.8 does not give you good enough solutions , try other algorithms.
Random forests may give you better solution but it is not human readable and it is not as fast as J4.8 (due to training multiple trees in process).
I recommend following strategy to you if you want to learn how tree algorithms work better.
- Read about J4.8 and how it is trained. Most tree algorithms use variation of CART, ID3, C4.5, C5.0. They are very similar conceptually.
- After that read about boosting and ensemble methods.
- Read about random forests. They use ideas from above methods.
- Read about other algorithms after these ones. For example NBTree uses naive bayes at the leaves. LMT uses "Classifier for building 'logistic model trees', which are classification trees with logistic regression functions at the leaves."
There are also some issues to consider choosing algorithms.
I found that some algorithms are more memory hungry than others. I worked with 4.8 Million instances database, KDD99. I could train j4.8 with 4GB ram but not random forests for that matter a lot of other algorithms, ( Neural Networks , SVM etc)
Some other tree classifiers may not deal with your attributes or your class size. for example ADTree only support two class problems. Some algorithms may not support date attributes etc.
You need to discretize the continuous variables first. A very common approach is finding the splits which minimize the resulting total entropy (i.e. the sum of entropies of each split).
See for example Improved Use of Continuous Attributes in C4.5, and Supervised and Unsupervised Discretization of Continuous Features. Weka offers the possibility to discretize your data. There are a number of tutorials showing how to do it. Regretfully I am not familiar with Weka, and cannot tell which one is good enough.
Best Answer
In the weka explorer, under the classify tab. Once you have chosen the J48 classifier and have clicked the start button, the classifier output displays the confusion matrix. Just under the start button there is the result list, right click the most recent classifier and look for the visualise tree option. Note that if things do not display well, you can right click the new window and select the fit to screen option.
edit: You can also save, then reuse, the model created in your code
To have the model file as a class in Java: In the weka explorer, under the classify tab. Click the button More options, and check the output source code box. Then re-run the classifier and code will be output to the Classifier output box.