How many trees would you suggest to pick to perform recursive feature elimination (RFE) in order to optimize Random Forest classifier (for binary classification problem). My dataset is very high-dimensional (> 200 000 features), and I usually select ~ 10 000 trees while running a classification without feature selection. But I am just wondering whether it is enough to set it as ~ 500-1000 for RFE in order to save time and RAM.
P.S.:
I use 'randomForest' and 'caret' R-packages if it makes any difference.
Best Answer
Optimizing ntree and mtry (above mtry=sqrt(#features) and ntree large enough for stabilization of OOB) is a dangerous area -- you need hard core nested cross-validation to be safe from overfitting, so you may end up doing more computations that you are trying to avoid.
I would say the better idea is not to use RFE -- with 200k features it will have terrible requirements and a minor chance to be stable. Instead, you can use some all-relevant RF wrapper like ACE or Boruta -- the set returned is likely to be larger than minimal-optimal, but still way smaller then the original and thus easier to be treated with RFE.
And remember to validate the feature selection regardless of the method (=