Solved – Can C4.5 handle continuous attributes

cartcontinuous datarweka

I'm trying to play with the breast cancer data available through UCI: https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data

When trying to classify the data through Weka using J48 decision tree, I'm noticing that the J48 algorithm is disables maybe because it can't handle continuous attributes. I can use C4.5 or C5.0 through R. Can these implementations of ID3 handle continuous attributes or do i need to do pre-processing to put the attributes in ranges?

I'd appreciate any example that shows classifying continuous attribute data via decision trees.

Best Answer

You need to discretize the continuous variables first. A very common approach is finding the splits which minimize the resulting total entropy (i.e. the sum of entropies of each split).

See for example Improved Use of Continuous Attributes in C4.5, and Supervised and Unsupervised Discretization of Continuous Features. Weka offers the possibility to discretize your data. There are a number of tutorials showing how to do it. Regretfully I am not familiar with Weka, and cannot tell which one is good enough.