Solved – remove features that has zero feature importance in random forest

feature selectionimportancerandom forest

We have 10 features that is pre-selected from domain knowledge. We ran random forest with those features. one of the feature has zero feature importance.
My question is:

  1. For those features that has zero importance in the random forest model, should I remove it and rerun the model?
  2. I did try that. When I remove the feature and rerun random forest, the importance of 7th important feature became zero, what should I do?
    Thanks a lot for any expert opinion…

Best Answer

A more rigorous way to pursue this question is to apply the Boruta algorithm.

Boruta repeatedly measures feature importance from a random forest (or similar method) and then carries out statistical tests to screen out the features which are irrelevant. The procedure terminates when all features are either decisively relevant or decisively irrelevant.

There are several papers on this topic. Here's one. "The All Relevant Feature Selection using Random Forest" by Miron B. Kursa, Witold R. Rudnicki

Related Question