Solved – Feature Importance using decision tree – categorical feature one-hot encoding or not

boostingcategorical-encodingfeature selectionimportance

I am trying to compare importance of over 100 features using xgboost. My question is, for xgboost (or any tree based methods), for Non-ordinal categorical features such as "race" (categorized as 0, 1, 2, 3).

It makes more sense to one-hot code "race" or not.
If yes, then how to compare the "importance of race" to other features. Should I sum-up importance of race_0, race_1, race_2, race_3, then compare it to other features?

Add more information:
The label (the Y feature) is binary. and I am using the xgboost library come with sklearn. I am following steps in this post https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/.

Best Answer

You should add up the importances of one-hot encoded features as the importance of the original feature. See https://stackoverflow.com/questions/40047343/how-to-explain-feature-importance-after-one-hot-encode-used-for-decision-tree

Related Question

Solved – Recursive feature elimination and one-hot & dumthe encoding