Solved – How to find a Information Gain for Numerical Values

entropymachine learning

I have 4 class and four sets of features extracted from the huge data extracted from Real Time data acquisition system.

Features for each Class.

In above table, there are 4 sets of the features for each class. (C1, C2, C3 and C4). But when calculating information gain, information gain comes same for each features as for all attribute as there is only one value of each attribute for each class.

Can someone please help me in calculating a information gain for building decision tree.

Thanks
DDas

Best Answer

In order to use Information Gain, you have to use a sort of sliding feature.

Sliding Splitting for Continuous Variables

In this example, it's using GiniSplit but you can apply Entropy / InformationGain. You essentially sort the data ascending. Then for every distinct value, you create a split (Less than or equal to value vs. greater than value) and calculate the InformationGain on that split. Finally, choose the split that improves InformationGain the most.

Related Question