Solved – J48 decision trees in weka

classificationweka

I am using J48 decision tree classifier in weka. In the testing option I am using percentage split as my preferred method. The split use is 70% train and 30% test. My understanding is that when I use J48 decision tree, it will use 70 percent of my set to train the model and 30% to test it. Although it gives me the classification accuracy on my 30% test set, I am confused as to why the classifier model is built using all of my data set i.e 100 percent. Shouldn't it build the classifier model only on 70 percent data set?

It mentions in the classification window that
=== Classifier model (full training set) ===
However, when I check the decision tree , it uses all 100 percent data instead of 70?
Is there a particular reason why Weka does this?

Best Answer

WEKA builds more than one classifier. It displays the one built on all of the data but uses the 70/30 split to predict the accuracy. For this reason, in most cases, the accuracy of the tree displayed does not agree with the reported accuracy figure. The reported accuracy (based on the split) is a better predictor of accuracy on unseen data.