Solved – Feature selection and parameter tuning with caret for random forest

caretfeature selectionrandom forestregression

I have data with a few thousand features and I want to do recursive feature selection (RFE) to remove uninformative ones. I do this with caret and RFE. However, I started thinking, if I want to get the best regression fit (random forest, for example), when should I perform parameter tuning (mtry for RF)? That is, as I understand caret trains RF repeatedly on different feature subsets with a fixed mtry. I suppose the optimum mtry should be found after the feature selection is finished, but will the mtry value that caret uses influence the selected subset of features? Using caret with low mtry is much faster, of course.

Hope someone can explain this to me.

Best Answer

One thing you might want to look into are regularized random forests, which are specifically designed for feature selection. This paper explains the concept, and how they differ from normal random forests

Feature Selection via Regularized Trees

There's also a CRAN package RRF that's build on the randomForest that will allow you to implement them easily in R. I've had good luck with this methodology myself.

Regarding your initial question, the only advice I can give is that if you have a lot of collinearity then you need to use smaller tree sizes. This allows the algorithm to determine importance with less interference from collinearity effects.

Related Solutions

Solved – Number of trees for Random Forest optimization using recursive feature elimination

Optimizing ntree and mtry (above mtry=sqrt(#features) and ntree large enough for stabilization of OOB) is a dangerous area -- you need hard core nested cross-validation to be safe from overfitting, so you may end up doing more computations that you are trying to avoid.

I would say the better idea is not to use RFE -- with 200k features it will have terrible requirements and a minor chance to be stable. Instead, you can use some all-relevant RF wrapper like ACE or Boruta -- the set returned is likely to be larger than minimal-optimal, but still way smaller then the original and thus easier to be treated with RFE.

And remember to validate the feature selection regardless of the method (=

Solved – Feature selection + classification in Caret

You should be able to accomplish everything you want with the sbf function instead. I originally assumed it worked the same way you are, but the functionality given by sbf is apparently more like a super set of what's available in train.

For example, something like this sounds like what you're getting at:

fit <- sbf(
  form = response ~ .,
  data = d, method = "glmnet", 
  tuneGrid=expand.grid(.alpha = .01, .lambda = .1),
  preProc = c("center", "scale"),
  trControl = trainControl(method = "none"),
  sbfControl = sbfControl(functions = caretSBF, method = 'cv', number = 10) 
)

This would run 10 outer folds and fit a single glmnet model to each, using only a feature subset. You could also specify some number of cv folds for trControl and a parameter grid to do training on inner folds.

Best Answer

Related Solutions

Solved – Number of trees for Random Forest optimization using recursive feature elimination

Solved – Feature selection + classification in Caret

Related Question