I have data with a few thousand features and I want to do recursive feature selection (RFE) to remove uninformative ones. I do this with caret and RFE. However, I started thinking, if I want to get the best regression fit (random forest, for example), when should I perform parameter tuning (mtry
for RF)? That is, as I understand caret trains RF repeatedly on different feature subsets with a fixed mtry. I suppose the optimum mtry
should be found after the feature selection is finished, but will the mtry
value that caret uses influence the selected subset of features? Using caret with low mtry
is much faster, of course.
Hope someone can explain this to me.
Best Answer
One thing you might want to look into are regularized random forests, which are specifically designed for feature selection. This paper explains the concept, and how they differ from normal random forests
Feature Selection via Regularized Trees
There's also a CRAN package RRF that's build on the randomForest that will allow you to implement them easily in R. I've had good luck with this methodology myself.
Regarding your initial question, the only advice I can give is that if you have a lot of collinearity then you need to use smaller tree sizes. This allows the algorithm to determine importance with less interference from collinearity effects.