Random Forest Classifier – Identifying Optimal Parameters

classificationmachine learningrandom forest

Currently i am using RF toolbox on MATLAB for a binary classification Problem

Data Set: 50000 samples and more than 250 features

So what should be the number of trees and randomly selected feature on each split to grow the trees?
can any other parameter greatly affect the results?

Best Answer

Pick a large number of trees, say 100. From what I have read on the Internet, pick $\sqrt{250}$ randomly selected features. However, in the original paper, Breiman used about the closest integer to $\frac{\log{M}}{\log{2}}$.

I would say cross-validation is usually the key to finding optimal parameters, but I do not know enough about random forests.

Related Question