Solved – ID3 algorithm in classification how to split

cartclassificationsupervised learning

The problem that I having is that I'm unable to choose a split. First, consider this picture
example.

So I have found the root node, i.e., the "age", going down the branches of the "age" I found that there are 3 child nodes. What I've learned is that I should split attribute with the highest impurity. In this example, there are 1 pure and 2 impure child nodes how do I decide on which node to split?

Below is the output

enter image description here

How do I know that I should split student at node "<=30" and split "credit rating " at >40?.

Best Answer

You calculate the information a particular attribute has for you, before and after any split. When you did that for all of the attributes, based on the difference between the before and after values you can judge which one should be taken for the first split. That is the amount of information that a particular split would give you. So, to answer your question, you start by splitting on the attribute which gives you the higher information gain.

If you want a more intuitive way of thinking about this, you need to picture what a decision tree does like a partitioning task. Imagine a 2D scatter plot where x-axis represents one (numeric) attribute and y-axis represents another. And there are blue and red circles as our class labels. Similar to the figure below. You would want to find out a split on the axis that gives you a more pure partition. A pure partition is a partition that keeps more circles of one color in its sides. In this tree, $x_2$ is chosen to be the first choice, because it keeps more circles of blue color on the left and read circles on the right.

enter image description here

Now suppose you know that initially you want to have a split on the attribute $age$ because a split on this attribute gives the highest information gain comparing to the other attributes. Since $age$ is categorical, you make $3$ different splits one for each possible case; $\leq 30$, $30 \cdots 40$, and $\lt 40$. Then, for each branch you need to repeat the same procedure but this time only on the observations that fall into the new partition. In your example, for those observations that $age \leq 30$ the attribute $student$ has a higher information gain when you look only at the rows where $age \leq 30$. So, this is a recursive task. And if you take a better look at the figure I added, you can see that one attribute may be chosen multiple times.

The original image comes from this great blog about Decision Trees. I added the orange lines.