Solved – Feature Importance of a feature in lightgbm is high but reduces evaluation score

feature selectionmachine learningpython

I am currently working on a machine learning project using lightGBM.

When I added a feature to my training data, the feature importance result I got from lgb.plot_importance(gbm, max_num_features=10)is high, but adding this feature reduced the RUC_AUC_score for performance evaluation.

In pursuing high prediction, do we just drop this feature?
Or theoretically there are few concerns I need to investigate about before dropping it?

My current understanding is that the feature importance in light gbm shows how many splits occurred on the feature when the data is trained, so it does not necessary mean that high feature importance always leads to better performance. So in the case above I can just drop the feature according to the decrease in evaluation score.

I have been pondering about this question for few days and still can come up to a solid conclusion by myself.
If any of my thought is wrong please kindly let me know.

Thanks.

Best Answer

If you look in the lightgbm docs for feature_importance function, you will see that it has a parameter importance_type. The two valid values for this parameters are split(default one) and gain. It is not necessarily important that both split and gain produce same feature importances. There is a new library for feature importance shap.
Go through this article written by the co-author of above library, he explains this is much better way.

Related Question