Solved – Rerunning with only important features doesn’t change model output

boostingfeature selectionpredictive-models

I am trying to predict sales of certain product using regression method. I am using XGboost and using MAPE as final metric for comparison between models. I have around 23 features but there are many categorical variables which i have converted into dummy variables. So now there around 210 features many of which are sparse.

I ran XGBoost model on this and i checked for feature importance using xgb.importance(). It showed the importance value for only 84 features. So i ran one more iteration of XGBoost only with these 84 features which are important but there is no change in model output.

So does presence of other features which is not important has any affect on XGBoost model ? How can i perform feature selection using XGBoost ?

Best Answer

Your result is correct, XGB recognizes that many of your features are not important and didn't use them in the process of building decision trees. You can force XGB to use all of them by increasing max tree depth setting, but you are overfitting the data this way.

Back to your problem, only 84 features are used by XGB and therefore discarding others produces very similar result

Related Question