Solved – Random forest parameters

classificationhyperparameteroptimizationrandom forestweka

I'm trying to make decisions regarding Random forest parameters for classification. My dataset contains 26 features and 6300 instances. How can I decide the values of (the number of trees, number of features, number of the bootstrap sample and the minimum node size) is there any formulas I should calculate help to decide about these values?

Thank you

Best Answer

Read the first RF article for implementation of RF in the R-project ("R-News"). I commonly use 500 to 5000 trees, but Breiman recommend using more in his original Machine Learning paper (he said: "Don't be stingy").

I have written the following for my own codes:

Number of features used for training at each node split, jtry. A unique characteristic of RF is that only a small number of features are randomly selected for training each tree. The parameter used to specify the number of features used is jtry, and a typical value is jtry=$\sqrt{p}$, where $p$ is the total number of features. Small values of jtry of 1 and 2 have shown to yield high performance values. If there are a large number of features to filter, then larger values of jtry will be required. During tree node splitting, jtry features are randomly selected from the $p$ features, and used to create node splits.

Number of trees, ntree. The number of trees used is specified by ntree. It has been shown that as few as 50 trees can result in reliable results (Cutler et al, 2007), but the majority of applications commonly use 500, 1000, 2000, or more trees. Breiman has recommended using at least 1000 trees (Breiman, 2001); however, it is common to use 500 trees, unless of course more trees results in greater performance.

Number of objects in terminal nodes, nodesize. The parameter nodesize controls the size of terminals nodes during node splitting while training a tree. Nodes with fewer than nodesize objects are not split, and therefore become terminal nodes. The ability of a tree to generate terminal nodes with class purity depends on the quality of the jtry features randomly selected for making each split as well as the value of nodesize. If nodesize$=1$, however, then every terminal node will have class purity.

Number of total nodes, nnode. The number of total nodes grown in a tree can also be limited by use of the parameter nnode. Features that are less informative for splitting objects according to class labels will result in trees with greater size, and therefore setting the upper bound to 500 or 1000 is certainly not unreasonable if there are hundreds of samples.

L. Breiman. Random Forests. Machine Learning, 45:5-32, 2001.

D.R. Cutler, T.C. Edwards, Jr., K.H. Beard, A. Cutler, K.T. Hess, J. Gibson, J.J. Lawler. Random forests for classification in ecology. Ecology, 88(11):2783-2792, 2007.