Solved – Boruta ‘all-relevant’ feature selection vs Random Forest ‘variables of importance’

Can someone explain the difference between variables of importance from random forest vs all-relevant features from Boruta feature selection?

For example, if one were to build a model (could be any model) using a sub-set of 'important' or 'relevant'features, would it be better to use the output from Boruta all-relevant feature selection, or the Random Forest 'variable of importance'?
Is one method preferred over the other? If so why?

Boruta and random forrest differences

Boruta algorithm uses randomization on top of results obtained from variable importance obtained from random forest to determine the truly important and statistically valid results. For details of the difference please refer to Section 2 of the article:

Is one method preferred over the other? If so why?

This is a classic case of "No Free Lunch" theorem. Without data and assumptions, it is impossible to decide which one is better. However, please note Boruta is produced as an improvement over random forest variable importance. So, it should perform better in more situations than not (Biased because I like randomization techniques myself). Nevertheless, data and computational time could make variable importance from random forest a better choice.

Solved – Boruta ‘all-relevant’ feature selection vs Random Forest ‘variables of importance’

Best Answer

Boruta and random forrest differences

Is one method preferred over the other? If so why?

Related Question

Best Answer

Boruta and random forrest differences

Is one method preferred over the other? If so why?

Related Solutions

Solved – Can random forest based feature selection method be used for multiple regression in machine learning

Solved – remove features that has zero feature importance in random forest

Related Question