Solved – Using LASSO on random forest

classificationensemble learninglassorandom forest

I would like to create a random forest using the following process:

  • Build a tree on a random samples of the data and features using information gain to determine splits
  • Terminate a leaf node if it exceeds a pre-defined depth OR any split would result in a leaf count less than a pre-defined minimum
  • Rather than assign a class label for each tree, assign the proportion of classes in the leaf node
  • Stop building trees after a pre-defined number have been constructed

This bucks the traditional random forest process in two ways. One, it uses pruned trees that assign proportions rather than class labels. And two, the stop criteria is a pre-determined number of trees rather than some out-of-bag error estimate.

My question is this:

For the above process that outputs N trees, can I then fit a model
using logistic regression with LASSO selection? Does anyone have
experience fitting a Random Forest classifier and post-processing with
logistic LASSO?

The ISLE framework mentions using LASSO as a post-processing step for regression problems but not classification problems. Furthermore, I don't get any helpful results when googling "Random forest lasso".

Best Answer

This sounds somewhat like gradient tree boosting. The idea of boosting is to find the best linear combination of a class of models. If we fit a tree to the data, we are trying to find the tree that best explains the outcome variable. If we instead use boosting, we are trying to find the best linear combination of trees.

However, using boosting we are a little more efficient as we don't have a collection of random trees, but we try to build new trees that work on the examples we cannot predict well yet.

For more on this, I'd suggest reading chapter 10 of Elements of Statistical Learning: http://statweb.stanford.edu/~tibs/ElemStatLearn/

While this isn't a complete answer of your question, I hope it helps.

Related Question