Solved – Get probability distribution from decision tree

cartmachine learning

I'm implementing decision tree based on CART algorithm and I have a question. Now I can classify data, but my task is not only classify data. I want have a probability of right classification in end nodes.
For example. I have dataset that contains data of classes A and B. When I put an instance of some class to my tree I want see with what probability the instance belongs to class A and class B. How can I improve CART to have probability distribution in the end nodes?

Best Answer

Decision trees does not have a proper scoring method for the distribution of the classes. In other words the probability distribution is given as the target class distribution in the leaf nodes at the training time.

Say you have $k$ classes. At the learning time you have to create a frequency vector of size $k$, and count in the frequency vector the times each class appear in the instances from that node. Than you can eventually normalize that vector in order to sum up all values to $1$ (to look like a probability mass function, but again it is not).

In the case of missing values at prediction time, the usual method is to obtain both probability distributions from left child node and right child node. With both of them, you build a new one as a sum of the two densities pondered by the number of instances from each node.

Related Question