Solved – Handling missing values for decision tree

cartmissing datarandom forest

I have seen that in many most learning algorithms, including decision tree learning algorithms, missing values are handled through imputation or estimation using EM algorithms and such.

I wanted to know since decision trees make their decision based on rules, can't we have a tree which checks if the particular attribute is missing and proceed with separate rules for that.
The following link describes this http://0agr.ru/wiki/index.php/Decision_Tree_%28Data_Mining%29#Handling_Missing_Values.

Is this a good idea and will it give a better result than simply replacing the missing values with the mean.
Are there any good libraries which implement this, the current one i am using is scikit-learn which doesn't do this.

Best Answer

Your question is somewhat similar to this

Why doesn't Random Forest handle missing values in predictors?

You can modify the algorithm to take missing values into consideration as used in CART C4.5 implementation but it is adds computational cost because of modification that you make. Adding this computational cost may not outweigh the improvement & so imputation is generally preferred.

Best Answer

Related Solutions

Boosting – Why AdaBoost Uses Decision Stumps Instead of Trees