Solved – Difference between rulefit and random forest

machine learningrandom forest

I'm trying to understand the difference between these a bit better. I understand pretty well how random forests work but I guess I'm more hazy on rulefit and how exactly it's different. I know rulefit will incorporate linear components and so can fit linear trends better. What other ways do they differ?

Best Answer

In fact, RuleFit does excessive pruning on a random forest. It tries to find a set of rules generated by random forest to obtain accuracy as close as possible to the accuracy of random forest while reducing the number of rules tremendously. Finally, it builds a model consisting of simple and short rules which are extracted from random forest and builds a comprehensive and understandable model from random forest which is a black box model. How ? It builds a linear model from random forest rules and using an optimization method (Lasso) finds a sparse weight vector that determines which rules are the most important ones. At the end few rules have non-zero weights and the rest of the rules are removed from the ensemble. There are also similar methods with the same aim such as NodeHarvest, but RuleFit has better performance.

Related Question